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(57) Abstract 

Disclosed herein are constitutively activated, non-endogenous versions of endogenous hunnan G protein-coupled receptors 
comprising (a) the following amino acid sequence region (C-terminus to N-terminus orientation) and/or (b) the following nucleic! 
acid sequence region (3' to 5' orientation) transversing the transmembrane-6 (TM6) and intracellular loop-3 (IC3) regions of the 
GPCR: (a) Pl^;, AAc^5 X and/or (b) Pcodon<i, (AA-codon)^15 X^i^codon, respectively. In a most preferred embodiment, P^c and! 

! Pcodon^^ are endogenous proline and an endogenous nucleic acid encoding region encoding proline, respectively, located within 

' TM6 of the non-endogenous GPCR; AA<^15 and (AA-codon)^15 are 15 endogenous amino acid residues and 15 codons encoding ; 

! endogenous ammo acid residues, respectively; and X and X(i,codon are non-endogenous lysine and a non-endogenous nucleic acid 
encoding region encoding lysine, respectively, located within tC3 of the non-endogenous GPCR. Because it is most preferred that 

ithe non-endogenous human GPCRs which incorporate these mutations are incorporated Into mammalian cells and utilized for the, 
screening of the candidate compounds, the non-endogenous human GPCR incorporating the mutation need not be purified and' 

I isolated per se (i.e., these are incorporated within the cellular membrane of a mammalian cell), although such purified and, 

: isolated non-endogenous human GPCRs are well within the purview of this disclosure. 

' (57) Abrege \ 

I 

La presente invention concerne des versions non endogenes et activees de fagon constitutive de recepteurs couples a la proteine | 
I G humaine endogene (GPCR) comprenant: (a) la region de sequence d'acides amines suivante (orientee de I'extremite C vers i 
I I'extremite N) et/ou (b) la region de sequence d'acides amines suivante (orientee 3' vers 5') transversale aux regions | 
Itransmembranaire 6 (TM6) et a boucle intracellulaire 3 (ICS) du GPCR:(a): P16AA^15X et/oub) Pcodon^ (AA-codon)(^15 X^codon, i 
' respecttvement. Dans un mode de realisation ideal, P^c et Pcodon^;^ representent respectivement une proline endogene et une | 
I proline codant pour une region codant pour un acide nucleique endogene, situees dans la TM6 du GPCR non endogene, AA<i^15 et i 
(AA-codon)<^,15 representent respectivement 15 residus d'acides amines endogenes et 15 codons codant pour les residus d'acides ! 
■ amines endogenes; et X et X<^codon representent respectivement une lysine non endogene et une lysine codant pour une region 
' codant pour un acide nucleique non endogene, situees dans tC3 du GPCR non endogene. Parce qu'idealement les GPCR humains 
I non endogenes comprenant ces mutations sont contenus dans des cellules mammaliennes et sont utilises dans le criblage de 
composes candidats, il n'est pas necessaire de purifier et d'isoler per se les GPCR humains non endogenes contenant la mutation j 
(c.-a-d.) qu'ils sont contenus dans la membrane cellulaire d'une cellule mammalienne), bien que cette invention couvre bien le | 
domaine de ces GPCR humains non endogenes purifies et isoles. ! 
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1 57) Abstract 



Disclosed herein are constitutively activated, non-endogenous verslon.s of endogenous human G protein-coupled receptors comprising 
(a) the following amino acid sequence region (C terminus to N-terminus orientation) and/or (b) the following nucleic acid sequence 
legion (3' to 5' orientation) transvcrsing the traiismembrane-6 (TM6) and intracellular loop-3 (IC3} regions of the GPCR: (a) P' AA15 
X and/or (b) P^odon (^^^ codon)i5 Xcodon, respectively. In a most preferred embodiment, P' and pcodon gj-e endogenous proline and an 
endogenous nucleic acid encoding region encoding proline, respectively, located within TM6 of the non -endogenous GPCR; AA15 and \ 
(AA-codofi))5 are 15 endogenous amino acid residues and 1.^ codons encoding endogenous amino acid residues, respectively; and X and i 
Xcodon are non-endogenous lysine and a non-cn doge nous nucleic acid encoding region encoding lysine, respectively, located within IC3 
of the non-endogenous GPCR. Because it is most preferred that the non-H^ndogcnous human GPCRs which incorporate these mutations 
are incotporatexl into mammalian cells and utilized for the screening of the candidate compounds, the non-endogenous human GPCR 
incorporating the mutation need not l>e purilied and isolated perse (i.e.. these arc incorporated within the cellular membrane of a mammalian 
cell), although such purified and isolated non^juiogenons human GPCRs are well within the purview of this disclosure. ; 
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NON-ENDOGENOUS, CONSTITUTIVELY ACTIVATED 
HUMAN G PROTEIN-COUPLED RECEPTORS 

The benefits of commonly owned U.S. Senal Number 09/170,496, filed 
October 13, 1998, U,S. Serial Number 08/839, 449 filed April 14, 1997 (now abandoned), 
U.S. Serial Number 09/060,188, filed April 14, 1998; U.S. Provisional Number 60/090 J83, 
filed June 26, 1998; and U.S. Provisional Number 60/095,677, filed on August 7, 1998, are 
hereby claimed. Each of the foregoing applications are incorporated by reference herein in 
their entirety . 

FIELD OF THE INVENTION 

The invention disclosed in this patent document relates to transmembrane 
receptors, and more particularly to human G protein-coupled receptors (GPCRs) which have 
been altered such that altered GPCRs are constitutively activated. Most preferably, the altered 
human GPCRs are used for the screening of therapeutic compounds. 
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BACKGROUND OF THE INVENTION 

Although a number of receptor classes exist in humans, by far the most abundant and 
therapeutically relevant is represented by the G protein-coupled receptor (GPCR or GPCRs) class. 
It is estimated that there are some 100,000 genes witlim the human genome, and of these, 
5 approximately 2% or 2,000 genes, aie estimated to code for GPCRs. Of these, there are 
approximately 100 GPCRs for which the endogenous ligand that binds to tlic GPCR has been 
20 identified. Because of the sigmficant tmie-lag that exists between the discovei^' of an endogenous 

GPCR and its endogenous ligand, it can be presumed that the remaining 1,900 GPCRs will be 
idetitificd and characterised long before the endogenous ligands for these receptors are identified. 
1 0 Indeed, the rapiduy by which the Human Genome Project is sequencing tlic 100,000 hum<in genes 
indicates that the remauiing human GPCRs will be fully sequenced within the next few years. 
Nevertheless, and despite the efforts to sequence the human genome, it is still very unclear as to 
how scientists will be able to rapidly, effectively and efficiently exploit this information to 
improve and enhance the human condition. The present invention is geared towards this 
15 important objective. 

Receptors, including GPCRs, for which the endogenous ligand has been identified are 
referred to as "known" receptors, while receptors for which the endogenous ligand has not been 
identified are referred to as "orphan" receptors. Tliis distinction is not merely semantic, 
particularly in tlie case of GPCRs. GPCRs represent an miportant area for the development of 
20 pharmaceutical products: from approximately 20 of the 100 known GPCRs, 60% of ail 
prescnption pharmaceuticals have been developed. Thus, the orphan GPCRs arc to the 
pharmaceutical mdusti7 what gold was to California m the late 19'^ century - an opportunity to 
50 ^"^'^ growth, cxpiinsion, enhancement and development. A serous drawback exists, however 
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with orphan receptors relative to the discovery of novel therapeutics. Tliis is because the 
traditional approach to the discovery and development of phannaccuticals has required access to 
both the receptor and its endogenous ligand. Thus, heretofore, orphan GPCRs have presented the 
art with a tantahzmg and undeveloped resource for the discovery of pharmaceuticals. 

Under the traditional approach to the discovery of potential tlicrapeulics, it is generally the 
case that the receptor is first identified. Before drug discovery efforts can be initiated, elaborate, 
time consuming and expensive procedures arc typically put into place in order to identify, isolate 
and generate the receptor's endogenous ligand - this process can require from between 3 and ten 
years per receptor, at a cost of about $5niiliion fU.S.) per receptor These lime and fmancial 
resources must be expended before the traditional approach to drug discovery can commence. 
This is because traditional dmg discovery techniques rely upon so-called "competitive binding 
assays" whereby putative therapeutic agents are "screened" against the receptor in an efibrt to 
discover compounds that either block the endogenous ligand from bindmg to the receptor 
("antagonists"), or enhance or mimic the effects of the ligand binding to the receptor ("agonists"). 
The overall objective is to identify compounds that prevent cellular activation when the ligand 
binds to the receptor (the antagonists), or that enhance or increase cellular activity that would 
otherwise occur if the ligand was properly bindmg with the receptor (the agonists). Because the 
endogenous 1 igands for orphan GPCRs are by definition not identi fied, the ability to discover novel 
and unique therapeutics to these receptors using traditional drug discovery techniques is not 
possible. The present invention, as will be set forth in greater detail below, overcomes tlicse and 
otfier severe limitations created by such traditional drug discovery techniques. 

GPCRs share a common structural motif. All these receptors have seven sequences of 
between 22 to 24 hydrophobic amino acids diat form seven alpha helices, each of which spans the 
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membrane feach spiin is identified by number, i.e., transmembrane- 1 (TM-l ), transmebrane-2 
(T\l~2), etc.). The transTnembrane helices arc joined by strands of amino acids between 
transmembrane-2 and lransmembrane-3, transmembranc-4 and transmembrane's, and 
transmcmbrane-6 and transmembrane-? on the exterior, or "extracellular" side, of the cell 
5 membrane (these are referred lo as "extracellular" regions L 2 and 3 (iiC-l. EC-2 and EC-3), 
respectively). Tlie transmembnine helices are also joined by strands of amino acids between 
20 transmembrane! ajid transmembrane-2, transmembrane-3 and transmcmbrane-4, and 

transtnembrane-5 and transmembrane-6 on the intcnor, or "intracelluliir" side, of the cell 
membrane (these are referred to as "inlracellular" regioiLS 1, 2 iind 3 (ICM, lC-2 and lC-3), 
10 respectively). The "carboxy" ("C") terminus of the receptor lies in the intracellular space within 
the cell, and the "ammo" ("N") terminus of the receptor lies m the extracelluhir space outside of 
the cell. The general structure of G protein-coupled receptors is depicted in Figure 1 . 

Generally, when an endogenous ligand binds with the receptor (otlen referred to i\s 
"activation" of the receptor), there is a change in the conformation of the intracellular region that 
35 15 allows for coupling between the intracellular region and an intracellular "G-protein." Although 

other G proteins exist, currently, Gq, Gs, Gi, and Go are G proteins that have been identified. 
Endogenous ligand-activated GPCR coupling with the G-protem begins a signaling cascade 
process (referred to as "signal transduction"). Under normal conditions, signal transduction 
ultimately results in cellular activation or cellular inhibition. It is thought that the IC-3 loop as 
20 well as the carboxy terminus of the receptor interact with the G protein. A pnncipal focus of this 
mvention is directed to the transmcmbrane-6 (TM6) region and the intracellular-3 (IC3) region of 
the GPCR. 

50 Under physiological conditions, GPCRs exist in thccclt membnme mequihbrium bctv.'een 
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two different conformations: an "inactive" state and an "active" state. As sliown schematically in 
Figure 2, a receptor in an inactive state is unable to link to tiie intracellular signaling transduction 
pathway to produce a biological response. Changing the receptor conformation to the active state 
allows Imkage to the transduction pathway (via the G-protein) and produces a biological response. 

A receptor may be stabilized in an active state by im endogenous ligand or a compound 
such as a drug. Recent discoveries, including but not exclusively limited to modifications to the 
amino acid sequence of tlic receptor, provide means otlier than endogenous ligands or dru^s to 
promote and stabilize the receptor in the active state conformation. These means effectively 
stabilize the receptor in an active state by simulatmg the effect of an endogenous lieand bindmu 
to the receptor. Stabilization by such ligand-mdepcndent means is termed "constitutive receptor 
activation." 

As noted above, the use of an orphan receptor for screening purposes has not been 
possible. Tliis is because the traditional "dogma" regarding screening of compounds mandates tliat 
the ligand for the receptor be knouTi. By definition, ihea tins approach has no applicability with 
respect to orphan receptors. Thus, by adhering to this dogmatic approach to the discovery of 
therapeutics, the art, in essence, has taught and has been tauglit to forsake tlie use of orphaji 
receptors unless and until the endogenous ligand for the receptor is discovered. Given that there 
are an estimated 2,000 G protein coupled receptors, the majority of which arc orphan receptors, 
such dogma castigates a creative, unique and distinct approach to the discovery of therapeutics. 

Information regarding the nucleic acid and^or amino acid sequences of a variety of GPCRs 
is summarized below in 1 able A. Because an important focus of the mvention disclosed herein 
is directed towards orphan GPCRs, many of the below-cited references arc related to orphan 
GPCRs. How^ever. this list is not intended to imply, nor is this list to be construed, legally or 
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Otherwise, that the invention disclosed herein is only applicable to orphan GPCRs or tht* specific 
GPCRs listed below. Additionally, certain receptors that have been isolated are not the subject of 
publications per sc: for example, reference is made to a G Protein-Coupled Receptor database on 
the "world-wide web" (neither the named inventors nor the assignee have any affiliation with this 
site) that lists GPCRs. Other GPCRs are the subject of patent applications owned by the present 
assignee and these arc not listed below (mcludmg GPR3, GPR6 and G?R\2\see U.S. Provisional 
Number 60/094879): 



Table A 



Receptor Name 


PubiicatioD Reference 


GPRl 


23 Genomics 609 (1994) 




14 DNA and Cell Biology 2S (1995) ' 


GPR5 


14 DNA and Cell Biology 25 (1995) 


GPR7 


28 GenoiriKS 84 (1995) 


GPRS 


28 Genomics 84 (1995) 


, GPR9 


! 184 J. Exp. Med. 963 (1996) 


! GPRIO 


1 29 Genomics 335 (1995) 


GPRl 5 


32 Genomics 462 (1996) 


GPRl 7 


70 J Ncurochem. 1 357 ( 1 998) 


GPRl 8 


42 Genomics 462 (1997) 


GPR20 


187 Gene 75 (1997) 


GPR21 


187 Gene 75 (1997) 


GPR22 


187 Gene 75 (1997) 


GPR24 


398 FEES Lett. 253 (1996) ' 


GPR30 


45 Genomics 607 (1997) ^ 


1 GPR31 


42 Genomics 519(1997) 


' GPR32 


50 Genomics 281 (1997) 


GPR40 


239 Biochem. Diophys. 




Res. Commun. 543(1997) 


GPR41 


239 Biochem. Biophys. 




Res. Commun. 543(1997) 


GPR43 


239 Biochem. Biophys. i 




Res. Commun. 54 3 ( 1 997 ) 


Al'J 


136 Gene 355 (1993) 


13LR1 


22 Eur. J. Immunol. 2759 (1992) 


CEPR 


23 1 Biochem. Biophys. 


1 


Res. Commun. 651 (1997) 


' FBII ! 


23 Gciionucs 643 (1994) 


EBI2 1 67 J. Virol. 2209 (1993) 


FTBR-LP2 


424 FEBS Len. 193 (1998) 


GPCR-CNS 


54 Bram Res. Mol. Brain Res 152 (1998); 




45 Genomics 68 (1997) 


GPR-NGA 


394 FEBS Lett, 325 (1996) : 


H9 


386 I LBSUn 219 (1996) I 
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1 HBA954 


1261 Diochim. Biophys. Acta 12] (1995) 




HG38 


247 Biochem. Biophys. ' 




Res. Commuii. 266 (1998) ! 


liM74 


5 Int. Immunol. 1239(1993) 


OGRl 


35 Genomics 397 (1996) 


V28 


163 Gene 295 (1995) 



As will be set forth and disclosed in greater detail below, utilization of a mutational cassette to 
modify the endogenous sec]uence of a human GPCR leads to a constitutively activated version of 
tlie human GPCR. These no n -endogenous, constitutively activated versions ofhuman GPCRs can 
be utilized, inter alia, for the screening of candidate compounds to directly identify compounds 
of, e.g., therapeutic relevance. 

SUMMARY OF THE INVENTION 

Disclosed herein is a non-endogenous, human G protein-coupled receptor compnsing 
(a) as a most preferred amino acid sequence region (C-terminus to N-terminus orientation) 
and/'or (b) as a most preferred nucleic acid sequence region (3' to 5' orientation) transvcrsing 
the transmcmbrane-6 (TM6) and intracellular loop-3 (IC3) regions of the GPCR: 
(a) P' AA,5 X 

wherein; 

( 1 ) P' is an amino acid residue located within the TM6 region of 
the GPCR, where P' IS selected from the group consisting of (i) 
the endogenous GPCR's proline residue, and (ii) a non- 
endogenous ammo acid residue other than proline; 

(2) AAj^are 15 amino acids selected from the group consisting of 
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(a) the endogenous GPCR^s amino acids (b) non-endogenoiis 

10 

amino acid residues, and (c) a combination of the endogenous 
GPCR's amino acids and non-endogenous amino acids, 
excepting that none of the 1 5 endogenous amino acid residues 
5 that arc positioned within the TM6 region of the GPCR is 

prohnc; and 

20 (3) X is a non-endogenous amino acid residue located within the 

IC3 region of said GPCR, preferably selected from the group 
consisting of lysme, hisitidine and argininc, and most 

25 

10 preferably lysine, excepting that when the endogenous ammo 

acid at position X is lysine, then X is an amino acid other than 
lysine, preferably alanine; 

30 

and/or 

(b) P^" (AA^codon);5X,„,,, 

35 15 wherem; 



40 



20 (2) 

45 



pctxion ^ nucleic acid sequence witiiin the TM6 region of the 
GPCR, where P^°" encodes an amino acid selected from the 
group consisting of (i) the endogenous GPCR's proline residue, 
and (ii) a non-endogenous amino acid residue other than proline; 
(AA-codon)j5 are 15 codons encoding 15 amino acids selected 
from the group consisting of (a) the endogenous GPCR's amino 
acids (b) non-cndogenou^; amino acid residues and (c) a 
combination of the endogenous GPCR's amino acids and non- 
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endogenous amino acids, excepting that none of the 15 
endogenous codons within the TV16 region ol tlic GPCR encodes 
a proHnc amino acid residue; and 
i^) ^ciKim is a nucleic acid encoding region residue located within the 
IC3 region of said GPCR, where X^^^j^n encodes a non-cndoccnous 
amino acid, preferably selected from the group consisting of 
lysine, hisitidinc and arginine, tind most preferably lysine, 
excepting that when the endogenous encoding region at posirion 
X^oj,,„ encodes the ammo acid lysine, tlien X,^^^,^ encodes iin amino 
acid other than lysine, preferably alanine. 
The terms endogenous and non-endogenous in reference to these sequence cassettes are relative 
to the endogenous GPCR. For example, once the endogenous proline residue is located within tlic 
TM6 region of a particular GPCR, and the 1 6^ amino acid therefrom is identified for mutation to 
constitutively activate the receptor, it is also possible to mutate the endogenous proline residue 
(i.e., once the marker is located and die 1 6^^ amino acid to be mutated is identified, one may mutate 
the marker itselO, aldiough it is most preferred that the proline residue not be mutated. Smularly, 
and while it is most preferred that AA,.be maintained in their endogenous forms, these amino 
acids may also be mutated. The only amino acid that must be mutated in the non-endogenous 
version of the human GPCR is X i.e., the endogenous amino acid that is 16 residues from P' 
cannot be maintained in its endogenous fomi and must be mutated, as llirther disclosed herein. 
Stated again, while it is preferred that m the non-endogenous version of the human GPCR, P' and 
AA,5 rem am m their endogenous forms {i.e., identical to their wild-typc forms), once X is 
identified and mutated, any and/br all of P' and AA.^ can be mutated. Tliis applies to the nucleic 
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aci(i sequences as well. In those cases where the endogenous anuno acid at position X is lysine, 
then in the non-endogenous version of such GPCR, X is an amino acid other than lysine, 
preferably alanine. 

Accordingly, and as a hy]iothctical exiimple, if the endogenous GPCR has the following 
endogenous amino acid sequence at the above-noted positions: 

P-AACCTIGGRRRDDDE -Q 
then any of the following exemplary' and hypothetical cassettes would fall within the scope of 
the disclosure (non-endogenous amino acids are set forth in bold); 

P-AACCTTGGRRllDDDE -K 
P-AACCTTHIGRRDDDE -K 
P-.ADEETTGGRRRDDDE -A 
P LLKFMSTWZLVyVAPO -K 
A-LLKFMSTWZLVA.\PQ -K 
It is also possible to add amino acid residues within AA,^, but such an approach is not particularly 
advanced. Indeed, m the most preferred embodiments, the only amino acid that differs in the non- 
endogenous version of the human GPCR as compared witii the endogenous version of that GPCR 
is the amino acid in position X; mutation of this amino acid itself leads to constitutive activahon 
of the receptor. 

Thus, in particularly preferred embodiments, P' and F*^" are endogenous proline and an 
endogenous nucleic acid encoding region encoding proline, respectively; and X and X^^^^^ are non- 
endogenous lysine or alanine and a non- endogenous nucleic acid encodmg region encoding lysine 
or alanine, respectively, with lysme bemg most preferred. Because it is most preferred that the 
non-endogenous versions of the human GPCRs wliich mcorporate these mutations are 
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incorporated into mammalian cells and utilized for the screening of candidate compounds, the non^ 
endogenous human GPCR incorporating the mutation need not he purified and isolated perse {i. e. 
these are mcorporated within the cellular mcmhrane of a mammalian cell), although such purified 
and isolated non-endogenous human GPCRs arc well within the pur\'iew ofthis disclosure. Gene- 
5 targeted and transgenic non-human mammals (preferably rats and mice) incorporating the non- 
endogenous htiman GPCRs are also within tlie purview of this invention; in particuleir, gene- 
^0 targeted mammals are most preferred in that these annnals will incorporate the non-endoeenous 

versions of the human GPCRs in place of the non-human nKimmars endogenous GPCR-encodinn 
region (techniques for generating such non-human mammals to replace the non-human mammal's 
10 protein encoding region with a human encoding region are well known; see, for example, U.S. 
Patent No. 5,777,194.) 

It has been discovered that these changes to an endogenous human GPCR render the 
GPCR constimtively active such that, as will be further disclosed herein, the non-endogenous, 
constitutivcly activated version of the human GPCR can be utilized for, inter alia, the direct 
35 1 5 screening of candidate compounds witliout the need for the endogenous hgand. Thus, methods 

for using these materials, and products identified by these methods are also within the purview of 
the following disclosure. 

BRIEF DESCRIPTIOiN OF THE DRAWINGS 
Fi^re 1 shows a generalized structure of a G protein-coupled receptor with the numbers 
20 assigned to the transmembrane helixes, the intracellular loops, and the extracellular loops. 

Figure 2 schematically shows the two states, active and inactive, for a typical G 
protein coupled receptor and the linkage of tiie active state to the second messenger 
50 transduction pathway. 
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Figure 3 is a sequence diagram of the preferred vector pC^fV'^ mciuding rcstnction 
cnzymcn site locations. 

Figure 4 is a diagrammatic representation of the signal measured conipanng pCMV\ non- 
endogenous, constitutiveiy active GPR30 inhibition of GPR6-mcdiatcd activation ofCRE-Luc 
5 reporter with endogenous GPR30 inhibition of GPR6-mediatcd activation of CRE-Luc 
reporter. 

20 Figure 5 is a diagnimmatic representation of the signal measured conipanng pC\lV, non- 

endogenous, constitutiveiy activated GPRl 7 inhibition of GPR3-mediatcd activation of CRE- 
Luc reporter with endogenous GPRl 7 inhibition of GPR3-mediatcd activation of CRE-Luc 

25 

1 0 reporter. 

Figure 6 provides diagrammatic results of the signal measured comparing control 
pCMV. endogenous APJ and non-endot^enous APJ. 

30 

Figure 7 provides an illustration of IP3 production from no n- endogenous human 5- 
HT2A receptor as compared to tlie endogenous veniion of this receptor. 
35 i 5 Figure 8 arc dot-blot format resuhs for GPRl (SA), GPR30 (8B) and APJ (8C). 



DETAILED DESCRIPTION 

40 

The scientific literature that has evolved around receptors has adopted a number of terms 
to refer to ligands having vanous effects on receptors. For clarity and consistency, the following 
definitions will be used throughout this patent document. To the extent that these definitions 

45 

20 conflict with other definitions for these terms, the following definitions shall control: 

AGONISTS shall mean compounds that activate the intracellular response when tliey bind 
50 to the receptor, or enhance GTP binding to membranes. 
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AMINO ACID ABBREVI ATIONS used herein are set beiow 



ALANINE 


ALA 1 A 


ARGLNINE 


AKG 


R 


ASPARAGINT^ | ASN 


N 


ASPAR'nC ACU) 


ASP 


D 


C^'STEINI: 


CYS 


C 


GLUTAMIC ACTT) 


GLU 


i 


GLLnr.^MJNE 


GLN 


Q 


GLYCINE 1 GLY 


G 


inSTIDrNE 


ms 


H 


ISOLEUCINT: 


ILE i I 


LLUCINL 


LEU 


L 


LYSTNT 


LYS 


tc 


METHIONINE I MET 


M 


PHE>n'L.\LANTNT, 


PHE 


F 


PROLINT^ 


PRO 1 P 


SERINE. 


SER 


S 


'IHREONINE 


THR 


T 


TR\TTOPHAN 


IKP 


W 


TYROSLNE 


TYR 


Y 


VALLNE 


VAL 


V 



PARTIAL AGONISTS shall meiui compounds which activate the iniracellular response 
when they bind to the receptor to a lesser degree/extent than do agonists, or enhance GTP binding 
to membranes to a lesser degreev^extent than do agonists 

ANTAGONIST shall mean compounds that competitively bind to the receptor at the 
same site as the agonists but which do not activate the intracellular response initiated by the active 
form of the receptor, and can thereby inhibit the intraceiluiar responses by agonists or partial 
agonists. ANTAGOMSTS do not diminish the baseline intraceiluiar response m the absence of 
an agonist or partial agonist. 

CANDIDA'IT: compound shall mean a molecule (for example, and not limitation, 
a chemical compound) which is amenable to a screening technique. Prcferahly. the phrase 
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"cundidate compoiuid" does not include compounds which were publicly knowTi to he compounds 
selected from the group consisting of inverse agonist, agonist or antagonist to a receptor, ;is 
previously determined by an indirect identification process ("indirectly identified compound"), 
more preferably, not including an indirectly identified compound which has previously been 
determined to have therapeutic efficacy in at least one mammal; and, most preferably, not 
including an mdirectly identified compound which has previously been determined to have 
therapeutic utility in humans. 

COOON shall mean a grouping of three nucleotides (or equivalents to nucleotides) which 
generally comprise a nucleoside (adenosine (A), guanosine (G), cytidine (C), undme (U) and 
thymidine (T)) coupled to a phosphate group and which, when translated, encodes an amino acid. 

COMPOUND EFFICACY shall mean a mcasui cmcnt of the ability of a compound to 
inhibit or stimulate receptor functionality, as opposed to receptor binding affinity. A preferred 
means of detecting compound efficacy is via measurement of, e,g., [""^SJCTPyS binding, as further 
disclosed in the Example section of this patent document. 

CONSTITUTIVELY ACTIVATED RECEPTOR shall mean a receptor subject to 
constitutive receptor activation. In accordance with the invention disclosed herein, a non- 
endogenous, human constitutively activated G protein-coupled receptor is one that has been 
mutated to include tJie amino acid cassette P'AA.sX, as set fortii in greater detail below. 

CONSTITUTIVE RECEPTOR ACTIVATION shall mean stabilization of a receptor 
m the active state by means other than binding of the receptor with its endogenous iigand or a 
chemical equivalent thereof Preferably, a G prolem-coupled receptor subjected to constitutive 
recq^ior activation in accordance with the invention disclosed herein evidences at least a 10% 
difference in response (increase or decrease, as the case may be) to the signal measured for 
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constitutive activation as compared with the endogenous form of that GPCR, more preferably, 
about a 25% difference in such comparative response, and most preferably about a 50% difference 
in such comparative response. W\^cn used for the purposes of directly identifying candidate 
compounds, it is most preferred that the signal difTerence be at least about 50% such that there is 
a sufficient difference between the endogenous signal and the non-endogenous signal to 
differentiate bet^^^ecn selected candidate compounds, in most instances, the "dilTcrencc" will be 
an increase in signal; however, with respect to Gs-coupled GPCRS, the "difference" measured is 
preferably a decrease, as will be set forth in greater detjiil below. 

COW TACT or CON'J' ACTING shall mean bringing at least rwo moieties together, 
whether in an in vitro system or an in vivo system. 

DIRECTLY IDENTIFYING or DIRECTLY IDENTIFIED, in relationship to the 
phrase "candidate compound", shall mean the screening of a candidate compound agamst a 
constitutively activated Cj protein-coupled receptor, and assessing the compound efficacy of such 
compound. This phrase is, under no circumstances, to be mtcrprcted or imderstood to be 
encompassed by or to encompass the phrase "indirectly identifying" or "indirectly identified." 

ENDOGENOUS shall mean a material that is naturally produced by the genome of the 
species. ENDOGENOUS in reference to, for example and not limitation, GPCR. shall mean that 
which is naturally produced by a human, an insect, a plant, a bacterium, or a virus. By contrast, 
the term NON-ENDOGENOUS in this context shall mean that which is not naturally produced 
by the genome of a species. For example, and not limitation, a receptor which is not 
constitutively active in its endogenous fonm, but when mutated by using the cassettes disclosed 
herem and thereafter becomes constitutively active, is most preferably referred to herein as a "non- 
endogenous, constitutively activated receptor." Both terms can be utilized to describe both "in 
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vivo" and "in vitro" systems. For example, and not iimitation, m a screening approach, \hc 
endogenous or noii-endogenous receptor may be m reference to an in vitro screening system 
whereby the receptor is expressed on the cell-surface of a mammalian cell. As a further example 
and not limitation, where the genome of a miimmai has been manipulated to include a non- 
5 endogenous constitutively activated receptor, screenmg of a candidate compound by means of an 
m vivo system is viable. 

HOST CELL shall mean a cell capable of having a Plasniid and'or Vector incorporated 
therein, in the case of a prokarv'Otic I lost Cell a Piasmid is typically replicated as an autonomous 
molecule as the Host Cell replicates (generally, die Plasmid is thereafter isolated for introduction 
0 into a eukaiyotic Host Cell); in the case of a eukaiyotic Host Cell, a Piasmid is mtegrated into the 
cellular DNA of the Host Cell such that when the cukar>'ot)c Host Cell replicates, the Piasmid 
replicates. Preferably, for the purposes of the mvention disclosed herein, the Host Cell is 
eukaiyotic, more preferably, mammalian, and most preferably selected from the group consisting 
of 293, 293T and COS-7 cells. 
5 INDIRECTLY IDENTIFYING or INDIRECI L Y IDENIIFIED means the traditional 

approach to the drug discovery process involving identification of an endogenous ligand specific 
for an endogenous receptor, screening of candidate compounds against the receptor for 
detcmiinalion of those which interfere and/or compete with the hgand-receptor interaction, and 
assessing tlic efficacy of the compound for affecting at least one second messenger pathway 
0 associated with the activated receptor. 

INHIBIT or INHIBITING, in relationship to the tenn "response" shall mean that a 
response is decreased orprevented in the presence of a compound as opposed to in tlie absence of 
the compound. 
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1^"VT:RSE AGONISTS shall mean compoujids which bind to cither the endogenous form 
of the receptor or to the constitutivcly activated fomi of the receptor, and which inhibit the 
baseline intracellular response initiated by the active form of the receptor below the nonnal base 
level of activity which is obser\'ed in the absence of agonists or partial agonists, or decrease GTP 
5 binding to membranes. Preferably, the baseline intracellular response is inhibited in the presence 
of the inverse agonist by at least 30^,, more preferably by at least 50%, and most preferably by at 
least 75 Vo. as compared with the baseline response m the absence of the inverse agonist. 

KNOWN RECEPTOR shall mean an endogenous receptor for which the endoacnous 
ligand specific for tl)at receptor has been identified. 
10 LIGAND shall mean an endogenous, naturally occurring molecule specific for an 

endogenous, naturally occurring receptor. 

VrUTANT or MUTATION in reference to an endogenous receptor's nucleic acid and'or 
amino acid sequence shall mean a specified change or changes to such endogenous sequences such 
that a mutated form of an endogenous, non-constitutively activated receptor evidences constitutive 
[5 activation of the receptor In terms of equivalents to specific sequences, a subsequent mutated 
form of a human receptor is considered to be equivalent to a first mutation of the human receptor 
if (a) the level of constitutive activation of the subsequent mutated form of the receptor is 
substantially the same as tliat evidenced by the first mutation of the receptor; and (b) the percent 
sequence (ammo acid and/or nucleic acid) homology between the subsequent mutated form of the 
0 receptor and the first mutation of the receptor is at least about 80%, more preferably at least about 
90"/> and most preferably at least 95%. Ideally, and owing to the fact that the most preferred 
cassettes disclosed herein for achicvmg constitutive acti\^ation includes a single amino acid and/or 
codon change between the endogenous and the non-endogenous forms of the GPCR (i.e. X or 
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\«don). tiie percent sequence homology should be at least 98%. 

ORPHAN RECEPTOR shall mean an endogenous receptor for which tlie endogenous 
ligand specific for that receptor has not been identified or is not known. 

PHARMACEUTICAL COMPOSITION shall mean a composition compnsing at least 
5 one active ingredient, whereby tlic composition is amenable to mvcstigation for a specified, 
efficacious outcome in a mammal (for example, and not limitation, a human). Those of ordinary- 
skill in the art will understand and appreciate the techniques appropnate for determining whether 
an active ingredient has a desired efficacious outcome based upon the needs of the iirtisan. 

PLASMID shall mean the combination of a Vector and cDNA. Generally, a PUismid is 
0 introduced into a Host Cell for the purpose of replication and'or expression of the cDNA as a 
protein. 

STIjVTIIL ATE or STTMLT.ATJNG, m relationship to the terni "response" shall mean that 
a response is increased in the presence of a compound as opposed to in the absence of the 
compound. 

5 TRANSW.RSE or TRANS\T:RSI\G. in reference to either a defined nucleic acid 

sequence or a defined amino acid sequence, shall mean that the sequence is located wiihm at least 
two different and defmed regions. For example, in an amino acid sequence tliat is 10 amino acid 
moieties in length, where 3 of the 1 0 moieties are in the TM6 region of a GPCR and the remaining 
7 moieties are in the IC3 region of the GPCR, the 10 ammo acid moiet>' ciin be described as 

0 transvcrsing the TM6 and IC3 regions of the GPCR. 

VECTOR m reference to cDNA shall mean a circular DNA capable of mcorporatmg at 
least one cDNA and capable of incorporation mto a Host Cell. 

The order of the following sections is set forth for presentational efficiency and is not 
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intended, nor should be construed, as a limitation on the disclosure or the claims to follow. 
A. Introduction 

The traditional study of receptors has always proceeded from the a priori assumption 
(histoncally biised) that the endogenous ligand must first be identified before discovery could 
5 proceed to find antagonists and oilier molecules that could a fleet the receptor. Even in ciises 
where an antagonist might liave been known first, the search immediately extended to looking for 
the endogenous ligand. Tliis mode of thinkmg has persisted in receptor research even after the 
discovery of constilutively activated receptors. What has not been heretofore recognized is that 
it is the active state of the receptor that is most useful for discovering agomsts, partial agomsts, and 
0 inverse agonists of the receptor. For those diseases wh:ch result from an overly active receptor or 
an under-active receptor, what is desired in a therapeutic drug is a compound which acts to 
diminish the active state of a receptor or enhance the activity of the receptor, respectively, not 
necessarily a drug which is an antagonist to the endogenous ligand. Tliis is because a compound 
that reduces or enhances the activity of the acnvc receptor state need not bind at the same site as 
5 the endogenous ligand . Thus, as taught by a mctliod of this invention, any search for therapeutic 
compounds should start by screening compounds against the ligand-independent active state. 

Screemng candidate compounds against non-endogenous, constituti vely activated GPCRs 
allows for the direct identification of candidate compounds wliich act at these cell stuface 
receptors, without requinng any prior knowledge or use of the receptor's endogenous ligand. By 
0 detenrurdng areas within the body where the endogenous version of such GPCRs are expressed 
and/or over-cxprcssed, it is possible to determine related disease/disorder states which are 
associated with tlie expression and/or over-expression of these receptors; such an approach is 
disclosed in tliis patent document. 
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B. Disease/Disorder Identification and/or Selection 

Most preferably, inverse agonists to the non -endogenous, constitutive! y activated GPCRs 
can be itientified using the materials of this invention. Such inverse agonists arc ideal candidates 
as lead compounds in drug discover/ programs for treating diseases related to these receptors. 
Because of the abihty to directly identify inverse agonists, partial agomsts or agonists to these 
receptors, thereby allowing for the development of pharmaceutical compositions, a search, for 
diseases and disorders associated with these receptors is possible, f or example, scanning both 
diseased and normal tissue samples for the presence of these receptor now becomes more than an 
academic exercise or one which might be pursued along the patli of identifying, m the case of an 
orphan receptor, an endogenous ligand. Tissue scans can be conducted across a broad range of 
healthy and diseased tissues. Such tissue scans provide a preferred first step in associating a 
specific receptor with a disease and/or disorder. 

Preferably, the DN A sequence of the endogenous GPCR is used to make a probe for either 
radiolabeled cDNA or RT-PCR identification of the expression of the GPCR in tissue samples. 
The presence of a receptor in a diseiised tissue, or the presence of the receptor at elevated or 
decreased concentrations in diseased tissue compared to a normal tissue, can be preferably utilized 
to identify a correlation with that disease. Receptors can equally well be localized to regions of 
organs by this technique. Based on the known functions of the specific tissues to which the 
receptor is localized, the putative functional role of the receptor can be deduced. 

C. A "Human GPCR Proline Marker" Algorithm and the Creation of 
Non-Endogenous, Constitutively-Active Human GPCRs 

Among the many challenges facing the biotechnolog>' arts is the unpredictability m 
gleaning genetic information from one s-pccies and con-elating that information to another species 
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- nowhere in this an does this problem evidence more annoying exacerbation than m the genetic 
sequences that encode nucleic acids and proteins. Thus, for consistency and because of the highly 
unpredictable nature of this art, the following invention is limited, in terms of mammals, to human 
GPCRs - applicability of this invention to other mammalian species, while a potential possibility, 
is considered beyond mere rote application. 

In general, when altemptmg to apply common "rules" from one related protein sequence 
to another or from one species to another, the art has typically resorted to sequence alignment, l e., 
sequences are Imearizcd and anempts are then made to find regions of commonality between two 
or more sequences. While useful, this approach does not always prove to result in meaningful 
information. In the case of GPCRs, while tlie general structural motif is identical tor all GPCRs, 
the variations m lengths of the TMs. ECs and ICs make such alignment approaches from one 
GPCR to another difficult at best Thus, while it may be desirable to apply a consistent approach 
to, e.^., constitutive activation from one GPCR to another, because of the gxeat diversity in 
sequence length, fidelity, etc from one GPCR to the next, a generally applicable, and readily 
successftil mutational alignment approach is in essence not possible. In an analogy, such an 
approach is akin to having a traveler start a journey at point A by giving the traveler dozens of 
different maps to point B, without any scale or distance markers on any of the maps, and then 
asking the traveler to find the shortest and most efficient route to destination B only by using the 
maps. In such a situation, the task can be readily simplified by having (a) a common "place- 
marker" on each map, and (b) the ability to measure the distance fn)m the place-marker to 
destination B - this, then, will allow the n-aveler to select the most efficient from starting-point A 
to destination B. 

In essence, a feature of the invention is to provide such coordinates within human GPCRs 
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that readily allows for creation of a constimtivcly active form of the human GPCRs. 

As those in the art appreciate, the transmembrane region of a cell is highly hydrophobic; 
thus, u&miz standard hydrophobicity plotting techniques, those in the ait are readily able to 
determine the I M regions of a GPCR, and specifically TM6 (this same approach is also 
apphcable to determining the EC and IC regions of the GPCR). It has been discovered that within 
the TM6 region of human GPCRs, a common proline residue (generally ncai the middle of TM6), 
acts as a constitutive activation "marker." By counting 1 5 ammo acids from the proline marker, 
the 16"^' ammo acid (wliich is located in the 1C3 loop), when mutated from its endogenous form 
to a non-endogenous form, leads to constitutive activation of tlic receptor. Vor convenience, we 
refer to this as the "Humiin GPCR Proline Marker" Algorithm. Although the non-endogenous 
amino acid at this position cim be any of the amino acids, most preferably, the non-endogenous 
amino acid is lysine. While not wishing to be bound by any theory, we believe that this position 
itself is unique and that the mutation at this location impacts the receptor to allow for constitutive 
activation. 

We note that, for example, when the endogenous ammo acid at the 1 6'^' position is afrcady 
lysine (as is the case with GPR4 and GPR32), then in order for X to be a non-endogenous amino 
acid, it must be other than lysine; thus, in those situations where the endogenous GPCR has an 
endogenous lysine residue at the 16'-'' position, the non-endogenous version of that GPCR 
preferably incorporates an amino acid other than lysine, preferably alamne, histidine and an^mmc, 
at this posiiion. Of further note, it has been determined that GPR4 appears to be linked to Gs and 
active in its endogenous form (daui not shown). 

Because there are only 20 naUirally occumng amino acids {although the use of non- 
naturally occumng amino acids is also viable), selection of a particular non-endogenous amino 
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acid for substitution at this 16"^ position is viable and allows for efficienl selection of a non- 
endogenous amino acid that fits the needs of the investigator. However, as noted, the more 
preferred non-endogenous amino acids at the 16^ position arc lysine, hisitidinc, arginine and 
alanine, with lysine being most preferred. Tliosc of ordinary skill in the art are credited with the 
abihty to readily dct ermine proficient methods for changmg the sequence of a codon to achieve 
a desired mutation. 

It has also been discovered that occasionally, but not always, the proline residue marker 
will be preceded in TM6 by \V2 {i.e., W2P'A.A,.X) where W is trvptophan and 2 is any amino 
acid residue. 

Our discovery, amongst other things, negates the need for unpredictable and complicated 
sequence alignment approaches commonly used by the art. bidced, the strength of our discovery, 
while an algorithm in nature, is that it can be applied in a facile manner to human GPCRs, with 
dexterous simplicity by those m the art, to achieve a unique and higlily useful end-product, i.e., a 
constitutivcly activated version of a human GPCR. Because many years and significant amounts 
of money will be required to determine the endogenous ligands for the human GPCRs tliat the 
Human Genome project is uncovering, the disclosai invention not only reduces the time necessary 
to positively exploit this sequence infomiation, but at significant cost- savings. This approach tmly 
validates the importance of the Human Genome Project because it allows for the utilization of 
genetic information to not only understand the role of the GPCRs in, e.g.. diseases, but also 
provides the opportunity to improve the human condition. 
D. Screening of Candidate Compounds 

1. Generic GPCK screening assay techniques 

Wlien a G protein receptor becomes constituiively active, it couples to a G protein {e.g.. 
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Gq, Gs, Gi, Go) and stimulates release and subsequent binding of GTP to the G protem. The G 
protein then acts as a GTPase and slowly hydroiyzes the GTP to GDP, whereby the receptor, 
under normal conditions, becomes deactivated. However, constitutively activated receptors, 
mcluding the non-endogenous, humiiii constitutively active GPCRs of the present invention, 
continue to exchange GDP for GTP. A non-hydrolyzable analog of GTP, [''S1GTP;/S, can be 
used to monitor enhanced bmding to G proteins present on membranes which express 
constitutively activated receptors. It is reported that p^SJGTPyS can be used to monitor G protein 
coupling to membranes in the absence and presence of ligand. An exiimpJe of this monitoring, 
among other examples well-known and available to those m the art, was reported by Traynor and 
Nahorski in 1995. The prcfen-ed use of this assay system is for initial screening of candidate 
compounds because the system is generically applicable to all G protem-couplcd receptors 
regardless of the particular G protein that interacts with the intracellular domain of the receptor. 

B 2. Specific GPCR screening assay techniques 

C Once candidate compounds are identified using the "genenc" G protein- 
coupled receptor assay (i.e., an assay to select compounds that arc agonists, partial 
agonists, or inverse agonists), further screening to confirm that the compounds have 
interacted at the receptor site is preferred. For example, a compound identified by the 
"genenc" assay may not bind to the receptor, but may instead merely "uncouple" the G 
protein from the intracellular domain. 

a, Gs and Gi. 

Gs stimulates the enzyme adenyly] cyclase. Gi (and Go), on the other hand, 
inhibit this enzyme. Adenylyl cyclase catalyzes the conversion of ATP to c/\MP; thus. 
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constilutively activated GPCKs that couple the Gs protein are associated with increased 
cellular levels of cAAlP. On the other hand, constitutively activated GPCRs that couple the 
Gi (or Go) protein are associated with decreased cellular levels of cAMP. See, generally, 
"Indirect Mechanisms of Synaptic Transmission," Chpt. 8, From Neuron To Brain {3"^ Ed.) 
Nichols, J.G. et al eds. Sinauer Associates, Inc. (1992). Thus, assays that detect cAMP can 
be utilized to determine if a candidate compound is, e.g., an inverse agonist to the receptor 
{i.e., such a compound would decrease the levels of cAMP). A variety of approaches known 
in the art for measuring cAMP can be utilized; a most preferred approach relies upon the use 
of aiiti-cAMP antibodies m an ELISA-based format. Another type of assay that can be 
utilized is a whole cell second messenger reporter system assay. Promoters on genes drive the 
expression of the proteins that a particular gene encodes. Cyclic AMP drives gene expression by 
promoting the binding of a cAMP-rcsponsivc DNA binding protein or transcription factor (CREB) 
which then bmds to the promoter at specific sites called cAMP response elements and dnves tlic 
expression of the gene. Reporter systems can be consuncted which have a promoter containing 
multiple cANlP response elements before the reix)rter gene, e.g . p-galactosidasc or luciferase. 
Thus, a constitutively activated Gs-linked receptor causes the accumulation of cAMP that then 
activates the gene and expression of tlic reporter protein. The reporter protein such as (5- 
galactosidase or luciferase can then be detected using standard biochemical assays (Chen et al. 
1995). With respect to GPCRs that link to Gi (or Go), and thus decrease levels of cAMP, an 
approach to the screening of, e.g., inverse agonists, based upon utilization of receptors that link to 
Gs (and thus increase levels of cAMP) is disclosai in the Example section with respect to GPRI 7 
and GPR3(). 
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b. Go and Gq. 

Gq and Go arc associated with activation of the enzyme phospholipase C. 
which in turn hydrolyzes the phospholipid PIP., releasing two inlracellular messengers: 
diacycloglycerol (DAG) and mistol K4,5-tnphoisphatc (IPi). Increased accumulation of IP, 
5 is associated with activation of Gq- and Go-associated receptors. Sec, generally, "Indirect 
Mechanisms of Synaptic Transmission," Chpt. 8, From Neuron To Brain (3"^ Ed.) Nichols, 
J G. et a] cds. Sinauer Associates, Inc. (1 992). Assays that detect IP, accumulation can be 
utilized to determine if a candidate compound is, e.g., an inverse agonist lo a Gq- or Go- 
associated receptor (i.e., such a compound would decrease the levels of IP^). Gq-associatcd 
10 receptors can also been examined using an API reporter assay in that Gq-dependent 
phospholipase C causes activation of genes contaming API elements; thus, activated Gq- 
associated receptors will evidence an increase in the expression of such genes, whereby 
inverse agonists thereto will evidence a decrease in such expression, and agonists will 
evidence an increase in such expression. Commercially available assays for such detection 
35 15 are available. 
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E. Medicinal Chemistry 

Generally, but not always, direct identification of candidate compounds is preferably 
conducted in conjunction with compounds generated via combinatorial chemistry techniques, 
whereby thousands of compounds are randomly prepared for such analysis. Generally, the 
20 results of such screemng will be compounds having umque core structures; thereafter, these 
compounds are preferably subjected to additional chemical modification around a preferred 
50 core stn]cture(s) to further enhance the medicinal properties thereof. Such techniques are 
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known to those in the art iind will not be addressed in detail in this patent document. 

F. Pharmaceutical Compositions 

Candidate compounds selected for further development can be formulated mto 
pharmaceutical compositions using techniques well known to those in the art. Suitable 
pharmaceutically-acccptablc earners are available to those in the art; for example, .vt-e Remington's 
Phamiaceutical Sciences, 16'^ Edition, 1980, Mack Publishing Co., (Oslo et al.. cds.) 

G. Other UtUit>' 

Althougli a preferred use of the non-endogenous versions of the disclosed human GPCRs 
is for the direct identification of candidate compounds its inverse agonists, agonists or partial 
agomsts (preferably for use as pharmaceutical agents), these receptors can also be utilized in 
research settings. For example, in vitro and in vivo systems incorporating these receptors can be 
utilized to further elucidate and understand the roles of the receptors in the human condition, both 
normal and diseased, as well understanding the role of constitutive activation as it applies to 
understanding the signaling cascade. A value ui these non^endogenous receptors is tlial their 
utihty as a research tool is enhanced m that, because of their unique features, the disclosed 
receptors can be used to understand the role of a particular receptor m the human body before the 
endogenous ligand therefor is identified. Other uses of the disclosed receptors will become 
apparent to those in the art based upon, imer alia, a review of this patent document. 

EXAMPLES 

The following examples are presented for purposes of elucidation, and not limitation, 
of the present invention. Following the teaching of this patent document that a mutational 
cassette may be utilized in the IC3 loop of human GPCRs based upon a position relative to 
a proline residue in TM6 to constituti vely activate the receptor, and while specific nucleic acid 
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and amino acid sequences are disclosed herein, those ofordinary skill in Ihe art are credited 

10 

with the ability lo make minor modifications to these sequences while achieving the same or 

substantially similar results reported below. Particular approaches to sequence mutations are 

within the purview of the artisan based upon the particular needs of the artisan. 

5 Example 1 

Preparation of Endogenous Human GPCRs 

A vanety of GPCRs were utilized in the Examples to follow. Some endogenous human 

GPCRs were graciously provided tn expression vectors (iis aclaiowlcdged below) iind other 

endogenous human GPCRs were synthesized de novo usmg publicly-available sequence 

7^ 10 information. 

1. GPR1 (GeiiBank Accession Number: U13666) 

The human cDNA sequence for GPR I was provided in pRcC^VfV by Brian 
O'Dowd (University ofToronio). GPRl cDNA(l .4kB fragment) w as excised from the pRcCVfV- 
vector as a Ndcl-Xbal fragment and was subcloned into ttie Ndel-XbaJ site of pCMV vector (see 
15 Figure 3). Nucleic acid (SEQ.ID.NO.: I) and ammo acid (SEQ.ID.NO.: 2) sequences for human 

33 

GPRl were thereafter determined and verified. 

2. GPR4 (GenBank Accession Numbers: L36148, U35399, U2I051) 
The huniati cDNA sequence for GPR4 was provided in pRcCMV by Brian 

O'Dowd (University ofToronto). GPRl cDNA ( 1 .4kB fragment) was excised from the pRcCMV 
20 vector as an ApaI(bluntcd)-XbaJ fragment and was subcloned (with most of the 5' untranslated 
'^^ region removed) into HindII](bIunted)-Xbal site of pCMV vector. Nucleic acid (SEQ.ID.NO.: 3) 

and amino acid (SEQ.ID.NO.: 4) sequences for human GPR4 were thereafter determined and 
verified. 

50 

3. GPR5 (GenBank Accession Number: L36149) 
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The cDNA for human GPR5 was generated aiid cioned into pCMV expression 
vector as follows: PCR vv^is performed using genomic DNA as template and rTth pol>Tnerasc 
(Perkin Elmer) wnth tfie buffer system provided by the manufacturer, 0.25 ^^l of each pnmer, and 
0.2 niM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94^C for 1 mm; 64T 
5 for Imin; and 72 for 1 .5 mm. The 5 ' PCR pnmer contained an EcoRl site witti the sequence: 
5^-rATG.\ATTCAGATGCTCTA.\AC:GTCCCTGC-3- (SEQ.ID.NO.: 5) 
and the 3' primer contained Bamlll site with the sequence: 
5'-TCCGGATCCACCTGCACCTGCGCCTGCACC-3^ {SEQ.ID.NO.: 6). 
The 1 . 1 kb PCR fragment was digested with HcoRI and BamHl and cloned mlo EcoRI-BamHl 
10 site of PCiMV expression vector. Nucleic acid (SEQ.fD.NO.: 7) and amino acid (SEQ.ID.NO.: 
8) sequences for human GPR5 were thereafter dctcmiined and verified. 

4. GPR7 (GenBank Accession Number: U22491 ) 
The cDNA for human GPR7 was generated and cloned mto pCM V expression 
vector as follows: PCR condition- PCR was performed using genomic DNA as template and rTth 
5 polyTnerase (Perkin Ehner) with the buffer system provided by the manufacmrer, 0.25 ).i VI of each 
primer, and 0.2 niM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94'^C for 
1 mm; 62°C for Imin; and 72 '^C for Imin and 20 sec. Tiie 5' PCR primer contained a Hindlll site 
with tlie sequence: 

5'-GCAAGCTTGGGGGACGCCAGGTCGCCGGCT-3' (SEQ.ID.NO.: 9) 
0 and the 3" pnmer contained a BamHl site with the sequence: 

5'-GCGGATCCGGACGCTGGGGGAGTCAGGCTGC-3^ (SEQ iD.NO.: 10). 

Tlie 1 . i kb PCR fragment was digested with HmdHI and BamHl and cloned into Hindill-BamHl 

site of pCMV e.xpression vector. Nucleic acid (SEQ.ID.NO.; 1 1 ) and amino acid (SEQ.ID.NO.: 
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12) sequences for huiTiaii GPR7 were thereafter detenruned and verified. 

5. GPR8 (GenBank Accession Number: L72492) 

The cDNA for huinaii GPRS was generated and cloned into pCMV expression 
vector as follows: PGR was performed using genomic DNA as template and rTth polymerase 
5 (Perkin Hlmer) with the buffer system provided by the manufacturer. 0.25 [iW of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94"C for 1 mm: 62°C 
-0 for imin; iind 72 "C for 1mm and 20 sec. TTie 5' PGR pnmer contained an EcoRl site with tlie 

sequence: 

5'-GGGAATTCGTGAAGGGTGGGAGGTAGAATG 3' (SEQ.ID.NO.: 13). 
10 and the 3 * primer contained a Baml II site with the sequence: 

5'-ATGGATCGCAGGCCCTTCAGCACCGCA.>\TAT-3'(SEQ.ID.NO.: 14). 
The 1 . 1 kb PGR fragment was digested with EcoRI and Banifll and cloned into EcoRI-liamHI 
site of PGVfV^ expression vector. All 4 cDNA clones sequenced contained a possible 
polymorphism mvolving a change of ^imino acid 206 from Arg to Gin. Aside from thjs 
35 15 difference, nucleic acid (SEQ.ID.NO.: 1 5) and ammo acid (SEQ.ID.NO.: 16) sequences for human 

GPRS were thereafter determined and verified. 

6. GPR9 (GenBank Accession Number: X95876) 
The cDNA for human GPR9 was generated and cloned into pGMV expression 
vector as follows: PGR was perfomicd using a clone (provided by Brian O'Dowd) as template and 
20 pfu polymerase (Stratagcne) with the buffer system provided by the manufacturer supplemented 
with 1 0% DMSO, 0.25 pM of each pnmer, and 0.5 mM of each of the 4 nucleotides. The cycle 
condition was 25 cycles of: 94"G for 1 mm; for Imin; and 72 "G for 2.5 min. Tlie 5' PGR 
50 primer contained m EcoRl site with the sequence: 
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5'-ACGA ATTCAGCCA7 GGTCCrTGAGGTGAGTGACCACCAAGTGCTAAAT-3 ' 
CSEQ.ID.NO.; 17) 

and the 3' primer contained a BaniHJ site with the sequence: 
5'-GAGGATCCTGGAATGCGGGGAAGTCAG-3' (SEQ.ID.NO.; 18). 
The 1.2 kb PGR fragment w;is digested with EcoRI and cloned into EcoRI-Smal site of PCIUV' 
expression vector Nucleic acid (SEQ.ID.NO.: 19) and aniino acid (SEQ.FD.NO.: 20) sequences 
for human GPR9 were thereafter determined and verified. 

7. GPR9-6 (GenBank Accession Numher: U45982) 

The cDN A for human GPR9-6 was generated and cloned into pCMV expression 
vector as follows: PGR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacmrcr. 0.25 ^iM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of 94"C for 1 min; 62X 
for Imin; and 72 "^C for 1 min and 20 sec. The 5' PGR primer was kinased with the sequence: 
5*-TTAAGCTTGACGTAATGCCATCTTGTGTCC-3' (SEQ-ID.NO.: 21) 
and Ihc 3 ' primer contained a BamFTI site with the sequence: 
5'-TTGGATCCAAAAGAACCATGCACCTCAGAG-3' (SEQ.ID.NO.: 22). 
The 1 1 kb PGR fragment was digested with BamHl and cloned into EcoRV-BamlTT site of 
pCMV' expression vector Nucleic acid (SEQ.ID.NO.: 23) and anuno acid (SEQ.ID.NO.: 24) 
sequences for human GPR9-6 were thereafter dctcmiined and verified. 

8. GPRIO (GenBank Accession Number: L32672) 

The human cDNA sequence for GPRIO was provided m pRcCMV by Bnan 
O'Dowd (L^niversity of Toronto). GPRIO cDNA (1.3kB fragment) was excised from the 
pRcCMV vector as an EcoRl-Xbal fragment :md was subcloned into EcoRI-XbaJ site of pCMV 
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veclor. Nucleic acid (SEQ.ID.NO.: 25) ami iimmo acid (SHQ.ID.NO.: 26) sequences for human 
GPRIO were thereafter dctcnnincd and verified. 

9. GPR15 (GenBank Accession Number: U34806) 

The human cDNA sequence for GPR15 was provided in pCDNA3 by Bnan 
O'Dowd (University of Joronto). GPRI5 cDNA (I.5kB fragment) wiis excised from the 
pCDNA3 vector xs a Hindlll-Bain fragment and was subcloned into HindUJ-Bam site of pCM\^ 
vector Nucleic acid (SEQ.ID.NO.: 27) and amino acid (SEQ.ID.NO.: 28) sequences for human 
GPR15 were thereafter determined mid verilied. 

10. GPR17 (GenBank Accession Number: Z94154) 

The cDNA for humiin GPRl 7 was generated and cloned mto pCM\' expression 
vector as follows: PGR was perfomied using genomic DNA as template and rTth pohonerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 fiM of each pnmcr, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94"C for 1 min; 56°C for 
I nun and 72 "^C for I min and 20 sec. The 5' PGR pnmer contiuned an EcoRJ site with the 
sequence: 

5'-CTAGAATTCTGACTGCAGCCAAAGCATGAAT-3' (SEQ.ID.NO.: 29)andthc3^ primer 
contained a Bam HI site with the sequence: 

5'-GCTGGATGGTAAACAGTCTGCGGTCGGCCT-3' (SEQ.ID.NO.: 30). 
The I.l kb PGR fragment was digested wnth EcoRI and BamHI and cloned into EcoRI-BamHI 
site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 31) and amino acid (SEQ.ID.NO.: 
32) sequences for human GPRl 7 were thereafter determined and verified. 

11. GPRl 8 (GenBank Accession Number: L42324) 

The cDNA for human GPRl 8 was generated and cloned into pGM\' expression 
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vector as follows: PCR was performed using genomic DNA i\s template aiid rTth pol>iTierase 
(Perkin Eimcr) with the buffer system provided by the manufacluicr, 0.25 i.iM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94"C for 1 niin; 54"C 
for Imm; and 72 ""C for Imin and 20 sec. The 5' PCR pnmer was kmascd with the sequence: 
5 5'-ATAAGATGATCACC:CTOAACAATCAAGAT -3' (SEQ.ri).NO.: 33) 
and tlie 3' pnmer conlamed an EcoRl site with the sequence: 
5^-TCCGAATTCArAACATTTCACTGTTTATATTGC-3' (SEQ ID.NO.: 34). 
The 1 .0 kb PCR fragment was digested with EcoKI and cloned into blimt-EcoRI site of pCMV 
expression vector. All 8 cDNA clones sequenced contained 4 possible polymorphisms involving 
0 changes of amino acid 1 2 from Thr to Pro, amino acid 86 irom AJa to Glu, annno acid 97 from 
lie to Leu and amino acid 310 from Leu to Met. Aside from these changes, nucleic acid 
(SEQ.ID.NO.: 35) and amino acid (SEQ.ID.NO.: 36) sequences for human GPR 1 8 were thereafter 
determined and verified. 

12. GPR20 (GenBank Accession Number: U66579) 
5 The cDNA for human GPR20 was generated and cloned into pCM\' expression 

vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacmrer, 0.25 i^M of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of 94X for 1 min: 62'*C 
for Imin; and 72 ''C for 1 mm and 20 sec. Tlie 5' PCR pnmer was kinased with the sequence: 
) 5 '-CC AAGCTTCCAGGCCTGGGGTGTGCTGG-3' (SEQ.ID.NO.: 37) 
and the 3' pnmer contained a BamEQ site with the sequence: 
5'-ATGGATCCTGACCTTCGGCCCCTGGCAGA-3' (SEQ.ID.NO.: 38). 
The T2 kb PCR fragment was digested with Bamill and cloned into EcoRV-BamHI site of 
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PCMV expression vector. Nucleic acid (SEO-ID.NO.: 39) and amino acid (SLQJD.NO.; 40) 
sequences for human GPR20 were thereafter determined and verified. 

13. GPR21 (GeiiBauk Accession Number: U66580) 

The cDNA for human GPR21 was generated and cloned into pCMV expression 
5 vector as follows: PGR was performed using genomic DNA as template and rTth polymerase 
(Perkiji i:lrner) with the buffer system provided by the miinufacturer, 0.25 |i\4 of each pnmer, and 
20 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94^C for 1 mm; 62"C 

for 1mm; and 72 "C for I min and 20 sec. Tlie 5' PGR primer was kinased with the sequence: 
5'-GAGAATTGACTGGTGAG(TGAAGATGAACT-3' (SEQ.ID.NC: 41) 
1 0 and the 3 ' pnmer contained a BamliJ site with the sequence: 

5'-GGGGATCCCCGTAACI GAGCGAClTGAGAT-3' (SEO-FD NO.: 42). 
The I.l kb PGR fragment was digested with BamHl and cloned into EcoRV-BamHI site of 
pCMV' expression vector. Nucleic acid (SiiQ.ID.NO.: 43) and ammo acid (SEQ.ID.NC: 44) 
sequences for human GPR21 were thereafter detennined and verified. 
35 15 14. GPR22 (GenBaDk Accession Number: U66581) 

Tlic cDNA for human GPR22 was generated and cloned into pGMV expression 
vector as follows: PGR was pcrfomied using genomic DNA as template and rTth pel ym erase 
(Perkin Ehner) with the buffer system provided by the manufacturer^ 0.25 |iM of each pnmer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94"C for 1 min; 50^G 
20 for 1 mm; and 72 °G for 1 .5 min. The 5 ' PGR pnmer was kinased with the sequence: 
5'-TGGGGGGGGAAAA.\A.\GGAAGTGGTCC.\AA-3' (SEQ.ID.NO.: 45) 
and the 3' primer contained a Bamt-ll site \^^th the sequence: 
50 5'-TAGGATGGATrrGAATGTG<:}ATTTC^GTGAAA-3' (SEQ.ID.NO.: 46). 
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The 1.38 kb PCR fragment was digesiecl with B^miHi and cloned into EcoRV-BainUi sile of 
pCMV expression vector. Nucleic acid (SEQ.n:)-NO.: 47) and amino acid (SEQ.ID.NO.; 48) 
sequences for human GPR22 were thereafter determined and verified. 

15. GPR24 (CenBank Accession Number: U71092) 



5 The cDNA for human GPR24 was generated and cloned into pCMV expression 

vector as follows: PCR was pcrfomied using genomic DNA as template and rTth polymerase 
(Perkin Ehner) with the buffer system provided by the manutacturcr, 0.25 fjM of each primer, and 
0.2 niM of each 4 nucleotides. The cycle condition was 30 cycles of 94T for 1 min; 56"C for 
Imin; and 72 "C for 1 min and 20 sec. Tlic 5' PCR pnmer contains a Hindlll site with the 
1 0 sequence: 

5'-GTGAAGCTTGCCTCrGGTGCCTGCAGGAGG- V (SEQ.ID.NO.: 49) 
and the 3' primer contains an EcoRI site with the sequence: 
5^GCAGAATTCCCGGTGGCGTGnGTCJGTGCCC-.V (SEQ.ID.NO.: 50). 
The 1 .3 kb PCR fragment wa^ digested with Hindlll and EcoRl and cloned into HLndlll-EcoRI 
15 site of pC^TV expression vector. The nucleic acid (SEQ.ID.NO.; 51) and ammo acid sequence 
(SEQ.ID.NO.: 52) for human GPR24 were thereafter detemiincd and verified. 



16. GPR30(GenBank Accession Number: U63917) 

The cDNA for human GPR30 was generated and cloned as follows: the coding 
sequence of GPR30 (1 128bp in length) was amplified from genomic DNA usmg the primers: 
20 5'-GGCGGATCCATGGATGTGACTTCCCAA-3' (SEQ.ID.KO.: 53) and 
5'-GGCGGATCCCTACACGGCACTGCTGAA-3' (SEQ.ID.NO.: 54). 
The amplified product was then cloned into a commercially available vector, pCR2-l (Invitrogen), 
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usmg a 'TOPO-TA Cloning Kit" (Invitrogen, ?!!K4500-0i ), foliowmg manufacturer instnictions. 
The full-length GPR30 insert was liberated by digestion with BamHl, separated from the vector 
by agarose gel electrophoresis, and purified using a Scphaglas Bandprep^'^ Kit (Phannacia, U 27- 
9285-01 ) following maiiufaciurer instructions. The nucleic acid (SEQ.ID.NO,; 55) and ammo acid 
sequence (SEQ.ID.NO.: 56) for human GPR30 were thereafter detcnnined and verified. 
17. GPR31 (GenBank Accession Number: U65402) 

'Ilie cDNA for human GPR31 was generated and cloned into pCMV expression 
vector as follows: PGR was performed using genomic DNA as template and rTth polymerase 
(Perkm Elmer) with the buffer system provided by the manufacturer, 0.25 uM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94"C for 1 niin; 58*^C 
for Imin; and 72 °C for 2 min. The 5* PGR primer contained an EcoRI site with the sequence: 
5'-AAGG/\ATTCACGGCCGGGTGATGCCATTCCC-3^ (SEQ.ID NO.: 57) 
and the 3' primer contained a BamHI site with the sequence: 
5'-GGTGGATCCATAAACACGGG€GTTGAGGAC -3' (SEQ.ID.NO.; 58). 
The 1 .0 kb PGR fragment was digested with EcoRl and Bamlll and cloned into EcoRl-BamETT 
site of pCVTV' expression vector. Nucleic acid (SEQ.ID.NO.: 59) and amino acid (SEQ.ID.NO.: 
60) sequences for human GPR3 1 were thereafter determined and verified, 

18. GPR32 (GenBank Accession Number: AF045764) 

Tlie cDNA for human GPR32 was generated and cloned into pCMV expression 
vector as follows: PC:R was performed using genomic DNA as template and rl th polymerase 
(Peridn Elmer) witli the buffer system provided by the manutacUirer, 0.25 of each pnmcr, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94''C for 1 min; 56''C for 
Imin; and 72 for 1 mm and 20 sec. Tlic 5' PGR primer contained an EcoRJ site with the 
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sequcnce; 

5^TAAGAATTCCATA.\AAA'n ATGCiAATGG-3' (SE0.1D.NO.:243) 

and the 3' pnmcr contained a BamHl site witti the sequence: 

S'-CCAGGATCCAGCTGAAGTCTTCCATCATTC 3' <SEQ.1D.N0.: 244). 

The 1.1 kb PGR fragment was digested with EcoRI and Bamin and cloned into EcoRl-BaniHl 

site of pCM V expression vector. Nucleic acid (SEQ.ID.NO.: 245) iind ammo acid (SHO.ID.NO.: 

246) sequences for human GPR32 were thereafter determined and verified. 

19. GPR40 (GcnBank Accession Number: AF024687) 
The cDNA for human GPR40 was generated and cloned into pG\W expression 
vector as foJlows: PGR was performed using genomic DNA as template and rTtli poKTnerase 
(Perkin Elmer) with the buffer system provided by the manufacmrer, 0.25 ^iM of each pnmcr, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94''C for 1 min, 6S''C for 
Imin and 72 for 1 mm and 10 sec. The 5' PGR pnmer contained an EcoRl site wMtli the 
sequence 

5'-GGAGAATTCGGCGGCCCGATGGAGCTGCCCGG-3^ (SEQ.ID.NO.: 247) 

and the 3 ' primer contamed a BamHI site with the sequence 

5'-GCTGGATCCCCCGAGCAGTGGGGTTACTTC-3^ (SEQ.ID.NO.: 248). 

ThQ 1 kb PGR fragment was digested with EcoRI and BiimHl and cloned into EcoRJ-BamHl site 

of pG\fV expression vector. Nucleic acid (SEQ.ID.NO. : 249) and ammo acid (SEQ.ID.NO. : 250) 

sequences for human GPR40 were thereafter dctemimed and verified. 

20. GPR41 (GenBank Accession Number AF024688) 

The cDNA lor human GPR41 was generated and cloned mto pG^4V expression 

vector as follows: PGR was performed using genomic DNA as template and rTtJi pol>'merase 
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(Perkiii Eimer) with tlic buffer system provided by the manufacturer, 0.25 |.iM of each primer, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of for 1 niin, 65"C for 
Imin and 72 "C for 1 min and 10 sec. The 5' PGR pnmer contained an HindllJ site with the 
sequence: 

5 5^CTCAAGCTTACTCTCTCTCACCAGTGGCCAC-3' (SEQ.ID.NO.: 251 ) 
and tlie 3' primer was kmascd with the sequence 
5'-CCCTCCTCCCCCGGAGGACCTAGC-3' (SEQ.ID.NO.; 252). 

The 1 kb PGR fragment wa*i digested with Hindlll and cloned into Hindlll-biunt site of pCMV 
expression vector. Nucleic acid (SEQ.ID.NO.: 253) and amino acid (SEQ.ED.NO.: 254) 

0 sequences for human GPR41 were thereafter determined and verified. 

21. GPR43 (GenBank Accession Number AF024690) 
The cDNA for human GPR43 was generated and cloned into pC\W' expression 
vector as follows: PGR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) witli die buffer system provided by the manufacturer, 0.25 jaM of each pnmer, and 

5 0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 65''G for 
Imin; and 72 for 1 min and 10 sec. The 5' PGR pnmer contains an Hindlll site with the 
sequence: 

5'-TTTAAGCTTGCCCTCCAGGATGGTGGGGGAC-3' (SEQ.ID.NO.: 255) 
and the 3' primer contained an EcoRl site with the sequence: 
0 5'-GGCGAATTGTGAAGGTCCAGGGAAAGTGiGTA-3' (SEQ.ID.NO. 256). 

The 1 kbPGR fragment was digested with Hindlll and EcoRl and cloned into HindlD-EcoRl site 
ofpCMV expression vector. Nucleic acid (SEQ.ID.NO.: 257) and amino acid (SEQ.ID.NO.: 258) 
sequences for human GPR43 were thereafter determined and verified. 
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22. APJ (GenBank Accessioo Number: U03642) 

Human APJ cDNA (m pRcCMV vector) was provided by Brian O'Dowd 
(University of Toronto). The human AP} cDNA was excised from the pRcCMV vector as an 
EcoRl-Xl^al fblunted) fragment and wiis subcloned into EcoRI-Smal site of pCMV vector. 
Nucleic acid (SEQ.ID.NO.: 61 ) and amino acid (SEQ.LD.NO.; 62) sequences for human APJ 
were thereafter dctemiined and verified. 

23. BLRl (GenBank Accession iNumber: X68149) 

Tlic cDNA for human BLRl was generated and cloned mto pCMV expression 
vector as follows; PCR was pcrfonncd using thymus cDNA a.s template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the rn;inufacturer, 0.25 uM of each primer, and 
0.2 niM of each of the 4 nucleotides. The cvclc conditi( )n was 30 cycles of: 94"C for 1 min; 62''C 
for Imin; and 72 T for 1 min and 20 sec. The 5' PCR pnmcr contained an EcoRI site with the 
sequence: 

5'-TGAGAATTCTGGTGACTCACAGCCGGCACAG-3' (SEQ.ID.NO.: 63): 
and the 3' primer contained a Bam HI site with tiie sequence: 

5'-GCCGGATCCAAGGAA AAGCAGGAATAAAACKi-3' (SFQ.ID.NO.: 64). The 1 .2 kb PCR 
fragment was digested with EcoRI and BamHJ and cloned into EcoRJ-Bamlll site of pCMV 
expression vector Nucleic acid (SEQ-FD-NG.: 65) and amino acid (SEQ.ID.NO.: 66) sequences 
for human BLRl were thereafter determined and verified. 

24. CEPR (GenBank Accession Number: U77827) 

Tlie cDNA for human CEPR was generated and cloned into pCiMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Eimcr) with the buffer system provided by the manufacturer, 0.25 of each primer, and 
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0.2 m\l of eacli of the 4 nucleotides. The cycle condition was 30 cycles of: 94T for 1 rmn: 65'(^ 
for Imui: and ll^'C for 1 min and 20 sec. The 5' PCR primer wiis kinased with the sequence: 
5'-CAAAC}CTTGAAAG€TGCACGGTGCAGAGAC-3^{SE01D.NO.:67) 
and the 3' primer contained a BamHT site with the sequence: 

5 5 -GCGGATCCCGAGTCACACCGTGGCTGGGCC-3' (SEQJD.NO.: 68). 

Tlie 1.2 kb PGR fragment was digested with BamHl and cloned into l:coRV-BamHT site of 
pCMA' expression vector. Nucleic acid (SEQ.TD.NO.: 69) and amino acid (SEQ.TD.NO.: 70) 
sequences for human CEPR were thereafter determined and verified. 

25. EBll (GenBank Accession Number: L31581) 

0 The cDNA for human EBTl was generaled and cloned into pCMV expression 

vector as follows: PGR was performed using thvuius cDNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 of each primer, and 
0.2 mM of each of the 4 nucleotides. Tlie cycle condition was 30 cycles of: 94''C for 1 min; 62"G 
for I min; and 72 for 1 mm and 20 sec. The 5' PGR pnmer contained an EcoRl site with the 

5 sequence: 

5'-ACAGAATTCCTGTGTGGTTTTAGCGGGCAG-3^ (SEQ.ID.NO.: 71) 
and the 3' primer contained a BamHI site with llie sequence: 
5^-GTGGGATCGAGGGAGAAGAGTCGGCTATGG-3* (SEQ.ID.NO.: 72). 
The 1 .2 kb PGR fragment was digested with EcoRI and BaralD and cloned into EcoRI-Baml^i 
:) site of PGMV' expression vector. Nucleic acid (SEQ.ID.NO.: 73) and amino acid (SEQ.ID.NO.; 
74) sequences for human EBIl were thereafter dctcrmmed and verified. 

26. EB12 (GenBank Accession Number: L08177) 

ITic cDNA for human EBI2 was generated and cloned into pC.MV expression 
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vector as follows: PCR was pcrfomicd using cDNA clone (graciously provided by Kevm Lynch, 
University of Virginia Health Sciences Center; the vector utilized was not identified by the source) 
as template and pfu pol>Tnerasc (Stratagenc) with the buffer system provided by the manufacturer 
supplemented with 10% DMSO, 0.25 uM of each primer, and 0.5 mM of each of the 4 
5 nucleotides. The cycle condition was 30 cycles of: 94"C for 1 min; 60''C for 1 min; and 72 ''C for 
1 min and 20 sec. 'ITie 5' PCR primer contained an EcoRl site with the sequence: 
20 5'-CTGGAATTCACCTGGACCACCACCAATGGATA-3^ (SEQ.ID.NO.: 75) 

and tlie 3' pnmer contained a BamHl site with the sequence 
5'-CTCGGATCCTGC.\AAGTTTGTCATACAG TT-3^ (SEQ.ID.NO.: 76). 
1 0 The 1 .2 kb PCR fragment was digested with EcoRl and BamHJ and cloned mto EcoRTBaixiHl 
site of pCN'TV expression vector Nucleic acid (SEQ.ID.NO.: 77) and amino acid (SEQ.ID.NO.: 
78) sequences for human EB12 were thereafter determined and venfied. 

27. ETBU-LP2 (GenBank Accession Number: D38449) 
The cDNA for human ETBR-LP2 was generated and cloned into pCMV 
35 expression vector as follows: PCR was performed using brain cDNA as template and rTth 

polymerase (Pcrkin Elmer) with the buffer system provided by the manufacturer, 0.25 \xM of each 
primer, and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94''C for 
1 mm; 65''C for Imin; and 72 ^C for 1.5 mm. The 5' PCR contained an EcoRT site with the 
sequence: 

20 5^CTGGAATTCTCCTGCTCA1CCAGCCATGCGG -3' (SEQ.ID.NO.: 79) 
and the 3' primer contained a BamHl site with the sequence: 
5'-CCTGGATCCCCACCCCTACTGGGOC:CTCAG -3^ (SEQ.ID.NO.: 80). 
50 The 1.5 kb PCR fragment was digested with EcoRI and BamH I and c loned into EcoRTBamlll 
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site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 81) and amino acid 
(SEQ.ID.NO-: 82) sequences for human ETBR-LP2 were thcreafler dctemiined and verified. 
28. GHSR (GenBank Accession Number: U60179) 

Flic cDNA for hmnan GHSR was generated and cloned into pCM V expression 
5 vector as follows: PGR was perlbrmcd using hippocampus cDNA as template and TaqPIus 
Precision poiym erase (Stratagene) with the buffer system provided by the manufacturer. 0.25 yiU 
20 of each pruner, and 0.2 m^^ of each of the 4 nucleotides. Tlie cycle condition was 30 cycles of 

94"C for 1 min; 68'C for Imin; and 12%: for 1 min and 10 sec. For first round PCR, tlie 5^ PGR 
primer sequence was: 
10 5'-ATGTGGAACGGGACGCGCAGCG-3' (SEO ID.NO.: 83) 
and the 3' primer sequence was: 

5'-TGATGTATTAATACTAGATTCI-3^ (SEQ.fD.XO.: 84). 
Two microliters of the first round PGR was used as template for the second round PGR where the 
5' primer was kinased with s£X)uencc: 

35 ^5 5'-TACCATGTGGAAG(jCGAGGGCCAGGGAAGAGGCGGGGT-3XSEO.ID.NO.:85) 

and the 3' primer contained an EcoRI site with the sequence: 

5'-CGGAArTCATGTATTAATACTAGATTGTGTCCAGGCCGG-3'(SE0.1D.NO.:86). 
The 1.1 kb PCR fragment was digested with EcoRl and cloned into blunt-EcoRI site of pCMV 
expression vector. Nucleic acid (SEQ.m.NO.: 87) and amino acid (SEQ.ID.NO.: 88) sequences 
20 for human GIISR were thereafter determined and verified. 

29. CPCR-CNS (GenBank Accession Number: AF017262) 
llic cDNA for hiunan GPCR-GNS was generated and cloned into pCMV 
50 expression vector as follows: PCR was performed usmg brain cDNA as template mid rTth 
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poiymcriLse (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 of each 
primer, and 0.2 mM of each of the 4 nucleotides. Tlie cycle condition was 30 cycles of: 94^C for 
1 mm; 65"C for Imin; and 72 "C for 2 mm. The 5' PCR pnmcr contained a HindlU site with the 
sequence: 

5 5'-GCAAGCTTGTGCCCTCACCAAGCCATGCGAGCC-3' (SEQ.ID.NO.: 89) 
and tlie 3' primer contained an EcnRI site with the sa]uence; 
5'-CGGAATTCAGCAATGAGTTCCGACA(iAAGC-3' (SEQ.ID.NO.: 90). 
The 1.9 kb PCR fragment was digested wjth HmdIII and EcoRI and cloned into HindlJl-EcoRI 
site of pCNTV expression vector. All nine clones sequenced contained a potential polymorphism 
1 0 involving a S284C change. Aside from thjs diilerencc. nucleic acid (SEQ.ID.NO.: 91 ) and amino 
acid (SEQ.ID.NO.: 92) sequences for human GPCR-CNS were thereafter dctemriiied and venficd. 
30. GPR-NGA (CenBank Accession Number: U55312) 
Tlie cDNA for human GPR-NGA was generated and cloned into pCMV 
expression vector as follows: PCR was performed using genomic DNA as template ;md rfth 
[ 5 pol>Tnerase (Perkin Elmer) with the buffer system provuied by tlie nKuiufacturcr, 0.25 pM of each 
pnmer and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of 94'^C for 
1 nun, 56*^0 for 1 mm and 72 "C for 1 .5 min. Tlie 5' PCR pnnier contained an EcoRJ site w^th the 
sequence: 

5'-CAGAATTCAGAGA.^AAAAAGTGAATAl GGTTTTT-3' (SEQ.ID.NO.: 93) 
0 and the 3 ' primer contained a BamHI site with the sequence: 

5'-TTGGATCCCTGGTGCATAACAATTGAAAGAAT-3' (SEQ.ID.NO.: 94). 

ITie 1.3 kb PCR fragment was digested with EcoRJ and Bamlii and cloned into EcoRl-Bamfll 

site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 95) and amino acid (SEQ.ID.NO.: 
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96) sequences for human GPR-NGA were ihereafler determined and venficd. 
31. H9 (GenBank Accession Number: U52219) 

Tlie cDNA for human IIB954 was generated and cloned into pCMV expression 
vector as follow^s: PGR wiis perfonned using pituitiiry cDNA as template and rl th poKTncrase 
5 (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 ]jlM of each pnmer, aid 
0.2 mM of each 4 nucleotides. 1 he cycle condition was 30 cycles of: 94^G for 1 mm. 62°C for 
Imin and 72 °G for 2 min. The 5' PGR pnmer contains a Hindlll site with the sequence: 
5-GiGAAAGGTTAACGATGGGGAGGAGGAAGAT-3' (SEQ.ID.NO.: 97) 
and the 3' primer contains a Bam?iT site with the sequence: 
10 5^GTCK}GATGCTACGAGAGGATTTTTCACACAG-3' (SEQ.ID.NO.: 98). 

The 1.9 kb PGR fragment was digested with Hindlll and BaniHI and cloned mlo Hindlll- 
BaniHl site of pCM V expression vector. When compared to the published sequences, a 
diflerent isoform with 1 2 bp in frame insertion in the cytoplasmic tail was also identified and 
designated "H9b." Both isoforms contain two potential polymorphisms mvolvmg changes 
35 15 of amino acid P320S and ammo acid G448A. Isoform H9a contained another potential 

polymorphism of amino acid S493N, while isoform H9b contained two additional potential 
polymorphisms involving changes of amino acid I502T and amino acid A532T 
(corresponding to amino acid 528 of isoform H9a). Nucleic acid (SEQ.ID.NO.: 99) and 
amino acid (SEQ.ID.NO.: 100) sequences for human H9 were thereafter determined and 
20 verified (in the section below, both isoforms were mutated in accordance with the Human 
GPCR Proline Miirker Algonthm). 

32. IJB954 (GeoBank Accession Number: D38449) 
50 The cDNA for human HB954 was generated and cloned into pGMV expression 
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vector as follows: PCR was performed using brain cDN A ;is templaie and rTth polymerase ( Pcrkin 
Elmer) wilh the buffer system provided by the manufacturer, 0.25 \.iM of each pnmcr, and 0.2 mM 
of each of the 4 nucleotides. The cycle condition was 30 cycles of 94X^ for 1 nnn, 58^C for hnin 
and 72 ''C for 2 mm. 'I'he 5 ' PCR contained a Hindlll site with the sequence: 
5 5'-TCCA7\GCTTCGCCATGGGACATr\ACGGGAGCT -3' (SEQ.ID.NO.: 101) 
and the 3' pnmer contiiined an EcoRJ site with the sequence: 
5'-CGTG/\ATrCCAAGAATTTACAATCC7TGCT -3' (SEQ ID.NO.: 102). 
The 1.6 kb PCR fragment was digested with Hindlll and FcoRl and cloned mto Hmdlll- 
EcoRI site of pCVTV expression vector. Nucleic acid (SFQ.ID.NO.: 103) and amino acid 
10 (SEQ.ID.NO.; 1 04) sequences for human HB954 were thereafter determined and verified. 
33. HG38 (GenBank Accession Number: .VF062006) 

The cDNA for human HG38 was generated and cloned mto pCMV expression 
vector as follows: PCR was performed using bramcDNA iis template and rTlh polymerase (Perkin 
Elmer) with the buffer system provided by the manufacturer, 0.25 |.iM of each primer, and 0.2 mM 
35 ] 5 of each 4 nucleotides. The cycle condition was 30 cycles of 94'C for 1 min, 56"C for Imm and 

72 ''C for 1 min and 30 sec. Two PCR reactions were perfonned to separately obtain the 5' and 
3' fragment. For the 5' fragment, the 5' PCR primer contained an Hindlll site with the sequence: 
5 '-CCC AAGCTTCGGGCACCATGGACACCTCCC-3' (SEQ.ID.NO.: 259) 
and llie 3' primer contained a Bamlilsite with the sequence: 
20 5'-ACAGGATCCAAATGCACAGCACTGGl AAGC-3' (SEQ.ID.NO.: 260). 

This 5' 1 .5 kb PCR fragment was digested with Hindlll and B ami 11 and cloned mto an HindlLI- 
BamlTl site of pCMV. For the 3' fragment, the 5' PCR primer was kmased with the sequence: 
50 5'-CTATAACTGGGTTACATGGTTTAAC-3^ (SEQ.ID.NO. 261 ) 
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and the 3' pnmcr contained an EcoRI site with the sequence: 
5'-TnGAATTCACATATTAATTAGAGACATGG-3^ (SEQ.ID.NO : 262). 
The 1.4 kb 3' PGR fragment was digested with EcoRl and subcloned into a btunt-EcoRI site of 
pCMV vector. ^Ehe 5^ ajid 3' fragments were then hgated together through a conimon EcoR V site 
5 to generate the flill length cDNA clone. Nucleic acid (SEQ.ID.NO.: 263) and amino acid 
(SEQ.ED.NO.: 264) sequences for human HG38 were thereafter deterrmned and venfied. 



34. 1IM74 (CcnBank Accession Number: 1)10923) 

Tlie cDNA for human HM74 was generated and cloned mto pCMV expression 
vector as follows: PGR was perfomied using cither genomic DNA or thxinus cDNA (pooled) as 
10 template and r'lth polymerase (Perkin Elmer) with the buffer system provided by the 
manufacturer. 0.25 \iM of each pnmer, and 0.2 mM of each of the 4 nucleotides. Tlie cycle 
condition was 30 cycles of: 94T for 1 nun; 65%: for 1 min; :md 72 %: for i rmn and 20 sec. The 
5' PGR pnmer contained an EcoRJ site with the sequence: 
5'-GGAGA.ATTGAGTAGGGGAGGCGGTCGATC-3' (SEQ.ID.NO.: 105) 
1 5 and the 3' primer was kinased with die sequence: 

5 '-GG AGGATCGAGGAAAGCTTAGGjGGG AGTCG-3 ' (SEQ.ID.NO. : 1 06). 
The 1.3 kb PGR fragment was digested with EcoRI and cloned into EcoRl-Smal site of 
pCMV expression vector. Clones sequenced revealed a potential polymorphism involving a 
N94K change. Aside from this difference, nucleic acid (SEQ.ID.NO.: 107) and amino acid 
^5 20 (SEQ.ID.NO.: 108) sequences for human PIM74 were thereafter determined and verified, 

35. MIG (GeoBank Accession Numbers: AI<O44600 and AFO44601) 
Tlic cDNA for human MIG was generated and cloned into pCMV expression 
vector as follows: PGR was performed using genomic DNA as template and TaqPlus Precision 
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polymerase (Stratagcne) for first round PCR or pfli poi>7iierasc (Straiagene) for second round PGR 
with the buffer system provided by the manufacturer, 0.25 of each primer, and 0.2 mM 
(TaqPIus Precision) or 0.5 mM (pfij) of each of the 4 nucleotides. When pfu was used, 10% 
DMSO was meluded in the buffer. The cycle condition was 30 cycles of: 94^C for 1 min: 65"C 
5 for Imin; and 72 for: (a) 1 mm for first round PCR; and (b) 2 min for second round PCR. 
Because there is an intron in the coding region, two sets of primers were separately used to 
20 generate overlapping 5^ and 3' fragments. Tlic 5' fragment PCR pnmers were: 

5'-ACCATGGCTTGCAATGGCAGTGCGGCCAGGGGGCACT-3' (external sense) 
(SEQ.ED.NO.: 109) 

25 

10 and 

5^-CGACCAGGACA.\ACAGCATCTTGGlCACTTGTCTCCGGC-3Xintemal antisensc) 
(SEQ.rD.NO.: 110). 

30 

The 3' fragment PCR pnmers were: 

5*-GACCAAGATGCTGTTTGTCCTGGTCGTGGT(}TTTGGCAT-3 ' (mtcmal sense) 
35 15 (SEQ.ID.NO.: lll)and 

5^CGGAATTCAGGATGGATCGGTCTCTTGCTGCGCCT-3 ' (external antiscnsc witii an 
EcoRl site) (SEQ.ID.NO.: 112). 

The 5' and 3' fragments were ligated together by using the first round PCR as template and the 
kinased external sense pnmer and external antisense pnmer to perform second round PCR. The 
20 1.2 kb PCR fragment was digested with EcoRl and cloned into the biunt-EcoRl site of pCMV 
expression vector. Nucleic acid (SEQ.ID.NO.: 113) and ammo acid (SEQ.ID.NO.: 114) 
sequences for human MJCj were thereafter determined and verified. 
50 36. OGRl (GenBank Accession Number: U48405) 
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The cDNA for hurnaii OGR ] was generated and cloned into pCM\' expression 
vector as follows: PCR was pcrfonncd using genomic DNA as template and rTth pol^rnerase 
(Perlcin Elmer) with the buffer system provided by the manufacturer, 0.25 fiVt of each pnmer, iind 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 04"C for I min: 65^C 
5 for Imin; and 72*'C for I min and 20 sec. The 5' PCR pnmer was kinascd with the sequence: 
5'-CK}AAGCTTCAGGCCCAAAGATrTGGGAACAT-3' (SEQ.ID.NO.: 115): 
20 and the 3' primer contained a BaniHT site with the sequence: 

5'-GTGGATCCACCCGCGGAGGACCCAGGCTAG -3' (SEQ U^NO.: 116). 
The 1. 1 kb PCR fragment was digested with BamHI and cloned into the EcoRV-BaniHl site 
10 of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 117) and amino acid (SEQ.ID.NO.: 
1 18) sequences for human OGRl were thereafter determined and venfied. 
37. Serotonin 5111,^ 

30 

The cDNA encoding endogenous hum:in 51 IT, a receptor was obtained by RT-PCR 
using human brain poly- A"" RNA; a 5' primer from the 5' untranslated region with an Xho I 
35 1 5 restriction site: 

5'-GACCTCGAGTCCTTCTACACCTCATC-3' (SEQ.ID.NO: 119) 
and a 3' primer firom the 3' untranslated region contaimng an Xba 1 site: 
5'-TGCTCTAGATTCCAGATAGGTGA.\AACTTG-3' (SEQ.ID.NO: 120) 
PCR was performed using either TaqPlus™ precision pol>Tnerase (Stratagene) or rTth''"^ 
20 polymerase (Perkin Elmer) with the buffer system provided by the manufacturers, 0.25 of each 
pmner and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94''C for 
1 mm; 57 "C for Inun; and 72 for 2 min. Tlie 1.5 kb PCR fragment was digested with Xba 1 
50 and subcloned mto Eco RV-Xba 1 site of pBlucscript. The resulting cDNA clones w^re fully 
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scqucnced and found to encode Wo amino acid changes from the published sequences. The first 
one was a T25N mutation in the N-temiinal extraceliuiar domain; the second is an H452Y 
mutation. Because cDNA clones dcnved from two independent PCR re;ictions using Taq 
polymcra<;c from two different commcrciai sources (TaqPJus^ from Stratagene and rlth™ Pcrkin 
Elmer) confined the same two mutations, these mutations arc likely to represent sequence 
poi>Tnorphisms rather than PCR errors. With these exceptions, the nucleic acid (SEQ.TD.XO.: 
1 2 1 ) ajid amino acid (SEQ.ID.NO. : 1 22 ) sequences for human SHT.,^ were thereafter determined 
and venficd. 

38. Serotonin SHT^c 

The cDNA encodmg endogenous human SHl^c receptor was obtained from 
human brain poly- A' RNA by RT-PCR, llie 5' and 3' primers were derived from the 5' and 3' 
imtranslatcd regions and contained the following sequences: 
5'-GACCTCGAGGTTGCTTAAGACTGAAGCo^ {SEQ.ID.NO.: 123) 
5'-ATTTCTAGACATATGTAGCTTGTACCG-3' (SEQ.ID.NO.: 124) 
Nucleic acid (;SEQ.rD.NO. : 1 25) and ammo acid (SEQ.ID.NO. : 1 26) sequences for human SHT^c 
were thereafter determined and verified. 

39. V28 (GenBank Accession Number: U20350) 

The cDNA for human V28 was generated and cloned mto pCMV expression 
vector as follows: PCR was performed using brain cDNA as template and rTth polymerase (Pcrkm 
Elmer) with the buffer system provided by the manufacturer. 0.25 ^iM of each pnmer, and 0.2 nxM 
of each of the 4 nucleotides. The cycle condition was 30 cycles of 94^C for 1 min; 65°C for Imin; 
and 72 ''C for 1 min and 20 sec. The 5' PCR primer contained a ElindLQ site with the sequence: 
5'-GGTAAGCnGGCAGTCCACGCCAGGCCTTC-3' (SEQ.ID.NO.: 127) 
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and the 3' primer conlained an EcoRI site with the sequence: 

W 

5^TCCGAATTCTCTGTAGACACA.AGGCTTTGG-3' (SEQ ID.NO.: 128) 
The LI kb PGR fragment wiis digested with Hindlll itnd EcoRI and cloned into HindrTT-EcoRI 
site of pCMV expression vector. Nucleic acid (SEQ rD.NO.: 129) and amino acid (SEQ.LD.NO.; 
5 130) sequences for human V28 were thereafter dctemnned and verified. 



20 Example 2 

PRKPARA HON OF NON-LNDOGENOI S Hl MAN GPCRs 



1. Site-Directed Mutagenesis 

25 

Mutagenesis based upon the Human GPCR Prolme Marker approach disclosed herein was 
10 performed on the foregoing endogenous human GPCRs using Transformer Site-Directed 
Mutagenesis Kit (Clontech) accordmg to the manufacturer instructions. For this mutaeenesis 

30 

approach, a Mutation Probe and a Selection Marker Probe (unless otherwise indicated, the probe 
of SEQ.rD.NO.: 132 was the same throughout) were utilized, and the sequences of these for the 
35 specified sequences iire Hsted below in Tabic B (the parenthetical number is the SEQ. ID.NO.). 

15 For convenience, the codon mutation incorporated into the human GPCR is also noted, in standard 
form: 

40 

Table B 



45 



Receptor Identifier 


Mutation Probe Sequence 


Selection Marker Probe 


(Codon Mutation) 


{5'-3') 


Sequence (5'-3') 




{seq.id.no.) 


(SEQ.TD.NO.) 


(iGPRl 


OATCTCCAGTAGGCATAAGT 


CrCCrrCGGTCCTCCTATCGT 


(F245K) 


GGACAATTCTGG 


TGTCAGAAG 


(131) 


(132) 


GPR4 


AGAAGGCC\AGATCGCGCGG 


CTCdTCGGTCCTCCTATCGT 


{K223A) 


CTGGCCCTCA 


TGTCAGAAG'I^ 


(133) 




GPRS 


C(3GCGCCACCGCACGAAAAA 


CrC:C7TCGGTCCTCCrA i COT 


^ (V224K) 


GCTCATCTTC 


TGTCAGAAGT 
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GPR7 
('I250K) 



GPRS 

(T259K) 



30 GPR5> 
(MZ54K) 



(L241K) 



j GPRIO 
3,1 {F276K) 



GPR15 
{I240K) 



GPR17 

{V234K) 



40 GPR18 
(r231K) 



GPR20 
(M240K) 



GPR21 
45 (A251K) 



I GPR22 
' (F312K) 



GPR24 
(T304K1 



50 GPR3() 
(L258K) 



GPR31 

(Q221K.) 



GPR32 

5^ (K255A) 



GPR40 
(A223K) 



OPR41 



(134) ll^^,^,!^ 
ijCC\\.MjAAGCCj( ^GlGA.\Gri '\ CTCCTTCGGTCCTCCTA rCO'I 
CCTGGTGOTGGt^A | TGTCAG.\AGT 

(135) 



C A G(X'GGAAGGTG AAA GTCC 

TGGTCCTCGT 

(136) 



CTCCTTCCtGI CCrCCTATCGT 
TGTCAGAAGT 



CGGCGCCTGCX iGGCC\V\G€G | OCCTTCGGTCCTCCTATCGT 
GCIGGTGGTGGTG I 
(137) 



TGTCAGAAGT 



ccaagcac:aaagccaagaaa 

GTGACCArCAC 
(138) 



CTGCn CGGTCCTCCT A'l CGY 

igtcagaagt 



I gggccggcgcaccaaatcxt ctccttcggicctcctatcgt 

' TGCTGGTGGT i TGTCAGAAGT 
I '139) ~ 



CAAAAAGCJ GA.AGA.\ATCTA 
AGAAGATCATCTTTA ITG rCG 
(140) 



CAAGACCA.^GOCAAAACC/CA 

IGATCGCCAT 

(141) 



GTCA AGGAG AAG7 CC AAAAG 
GAICATCAIC 

jUI) 

CGCCGCGTOCGGGCC AAC i CA 

(JCTCCrGClC 

(143) 



CI CCTTC( jGTGCTCCrATCGT 
TGTCAGAAGT 



CrCCITCGGTCCTCa'A'JCG r 
IGTCAGAAGr 

CTC CITCGGTCCTCCTATCG'I 
TGTCAGAAGT 



CTCClTCCiGTCCTCCTATCGT 
TGTCAGAAGT 



CCTGAT/\AGCGCTATAAAAT 
(KjTCCTG'ITTCGA 
iJ144) 



CTC CTTCGG'l CCTCCTA TCGT 
'IGTCAGA.\GT 



(iAAAGAC/\A.A/\GAGAGTCA 
AG AGG ATGTCTTITATI G 
(145) 



COGAGAAAGAGGGTGAAAC 

CiCACAGCCATCGCC 

(146) 



CrCCTTCGGTCCTCCrATCGT 
TGTCAGAAGT 



CrCCTTCGGTCCTCCrATCGT 
TGTCAGAAGT 



alternate approach; see below alternate approach; see below 



AAGCTTCAGCGGGCC AAG GC 
ACTGGTCACC 

(liZi 



CATGCCAACCGGCCCGCGAG 

CCTGCrGCTGGT 

(279) 



cggaagci'gcgggccaaatg 
ggtggcccxk: 

(265) 



CAGAGGAGGGTC ; AAGCjGGCr 

gttggcg 



CTCCTTCGGTCCrCCT ATCGT 
TGTCAGAAGT 



ACCAC<:AGCAGCCrCGCGGG 

CCGGITGGCATG 

(280) 



CTCCTICGGTCCTCCTATCGT 
TGTCAGAAGT 



Ci CCTTCCK3TCCTCCrATCGT 
TGTCAGAAGT 
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10 
15 


(A223K) 


(206) i 


GPR43 
(V221K) 


CtGCGGCGCCGAGCC a AGGGG 
CTGGCTGTGG 
1 (267) 


CTCrTTC:GGTCCrCCrATCG I 
TG'ICAGAAGT 


APJ 
> (L247K) 


' alternate approach; see beiow 


alternate approach; see below 


BLR I 

(V258K) 


CAGCGGCAGAAGGGAAAA^\ 

GGG^'GGCCATC 

fM8) 


J CTCCTTC GG I CCrcCTA rCG l 
' IGICAGAAGT 


CEPR 
(1-258K) 


CGGCAG AAGGCG AAGCGC : A'l 
. GATCCTCGCG 
! (149) 


CTCCTTCGG^rCCTCCrATCGl 
TGICAGAAGT 






1( 

20 

25 It 
30 


)EBI1 

{I262K) 


GAGCGCAACAAGGCCAAA.\ | CiCCTTCGGTCCrcCTATCGT 

AGGTGATCATC 1 T(~;TCAGAAGI 

{150) 


EBI2 

(X243KJ 


GGTGTAAACAAAAAGGCTAA 

AAACACAATTATTCTTATI 

(151) 


CTCCnCGGTCCrCCI ATCGT 
njTCAGA.AGT 


ETBR-LP2 : (^AGAGCCAGCJl'AAGAGCAC 
as358K) i CGTGGTG 
: (152) 


CTCCTTCGGTCCTCCTATCGT 
IGICAGAAGI 


GHSR 
(V262K) 


CCACA^\GCA A ACC A AG AAA.^ 

TOCTGGCrG] 

(153) 


CTCCnCGG ICCrCCTATCGT 
KiTCAGAAGT 


GPCR-CNS 
(N491K) 


CI AGAGAGTCAGATG A AG I G 

TACAGTAGTGGCAC 

(155) 


ClCCTTCGGrccrCCTATCGT 
TCilCAG.^AGT 




GPR-NGA 
(I275K) 


CGGACAAAAGTGA.\AA CI AA 

A^GATCnCCrCATT 

(156) 


CTCCTTCGGTCCTCCTATCGT 
IGICAGAAGT 

1 


35 

2t 


H9a and I i9b 

(F236K) 


GCrGAGGTTCGCAATAAACr , 
AACCATGnTGTG ! 
(157) 


CTCCTTCGGTCCICCI ATCGT ! 
TGTCAGA.\Gr 


HB954 
(H265K) 


GGGAGGCCGAGCTGAAAGCC: 

ACCCTGCTC 

(158) 


CTCCTTCGGTCCTCCI ATCGT 
TGTCAGAAGT 


40 : 


HG38 
(V765K) 


G(jK3ACrGCrCrATGAAAAAA 
CACATTGCCCTG , 
(268) 


CAICAAGTGTATCATG'J^GCC i 
AAGTACGCCC ' 
(154) j 



3C 

45 

50 3- 


HM74 1 CAAGATCAAGAGAGCCAAAA 
(I230K) 1 CCTTCATCATG 

: (159) 


CT C:CrrCGGTCCTCCTATCGT 
TGTCAGAAGT 


1 MIG 
(T273K) 


CCGGAGACAAGTGAAGAAG 

ATGCTGTTTGTC 

(U)0) 


CTCCITCGGTCCTCCTATCGT 
TGTCAGAAGT 


OGR] 
(Q227K) 


GCAAGGACCAGATCAAGCCjG ; CTCCnCCiGTCCTCCTA'ICGT 
CTGGTGCTCA | TGTCAGAAGT 

(161) ; 


Serotonin SHT,^ 
(C322K) 


alternate approach; sec below 


alternate approach; see beiow 


Serotonin SHT^t 
(S310K) 


ahemate approach; see beluvv 


allemate approach; see below 
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1 V28 


CAAG Ay\AGCC AA A GC:C AAG 


CrCCTTCGGTCCTrCIA I C G I 


10 


(I230K) 


AAACT(3A'ICCTTC]G 


TGTCAGAAGr 


(162) 
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The non-endogenous human GPCRs were then sequenced and the denvcd and veri fied nucleic 
acid and amino acid sequences are hsted in the accompanying "Sequence Listing*' appendix 
S to this patent document, as summarized in Tabic C below: 

Table C 



Mutated GPCR 



GPRl 
j (F245K) 
\(^ GPR4 

; (K223A) 



GPR5 

(V224K) 



GPR7 

16 (T250K) 



GPRS 
(T259K) 



GPR9 

(M254K) 



20 GPR9-6 
: {L241K) 



GPRIO 
(F276K) 



2t 



3(1 



GPRl 5 
(I240K) 



GPRl 7 
(V234K) 



GPRl 8 
(n3iK:) 



GPR20 
(M240K) 



GPR21 
(A251K:) 



GPR22 
35 (B12K) 



GPR24 
(T304K)) 



GPR30 



Nucleic Acid Sequence 
Listing 



SFQ.ID.NO.: 163 



Amino Acid Sequence 
Listin" 



SEQ.ID.NO.: 164 



SliO-lD.NO.: 165 



SEQ.ID.NO.; lo6 



SBQ.ID.NO.: 167 



SEQ.ID.NO.: 169 



SEQ.ID.no.: 171 



SEQ.ID.no.: 173 



SEQ.ID.NO.: 175 



SEQ.ID.NO.: loS 



SliQ-ID.NO.: 170 



SEQ.ID.NO.: 172 



SEQ.ID.NO.: 174 



SEQ.ID.NO.: 176 



SEQ.ID.NO.: 177 



SEQ.ID.NO-: 179 



SEQ.ID.NO.: 178 



SEQ.ID.NO.: 180 



SEQ.ID.NO.: 181 



SEQ.ID.NO.: 183 



SEQ.ID.NO.: 185 



SEQ.ID.NO.: 187 



SEQ.rD.NO,: 189 



SEQ.ID.NO.: 182 



SEQ.ID.NO.: 184 



SEQ.ID.NO.: 18o 



SEQ.ID.NO.: 1S8 



SEQ.ID.NO.: 190 



SEQ.ID.NO.: 191 



SEQ.ID.NO.: 192 



SEQ.ID.NO.: 193 



SEQ-ID.NO.: 194 
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{L258K) 
GPR31 
(Q221K) 



GPR32 
(K255A) 



10 



G1»R40 
(A223K) 



GPR41 

(A223K) 



GPR4? 
(V221K) 



2(» 



AI»J 

(L247K) 



BIRl 

(V258K) 



CEPR 
(L258K) 
EBIl 
(1262K) 



EBI2 
(L243K) 



E1BR-LP2 
(N358K) 



i GHSR 
2 j (V262R) 



GPCR-GNS 
CN491K) 



GPR-NGA 
(I275K) 



3() H9a 
(F236Kt 



H9b 

(F236K) 



I HB954 
35 {H265K) 



HG38 
(V765K) 



HiM74 
(I230FC) 



4() MIG 

rr273K) 



OGRl 
(Q227K) 



SEQ.ID.NO.: 195 



SEQ.ID.NO.: 269 



SliQ.lD.NO.: 271 



SEQ.ID.KO.; 273 



SEQ.ID.NO.: 275 



SEQ.ID.NO.: 197 



SEO.ro.NO.: 199 



SEQ.ID.NO.: 201 



SEQ.ID.NO.: 203 



SEQ.ID.NO.: 205 



SEQ.ID.NO.: 207 



SEQ.ID.NO.: 209 



SEQ.ID.NO.: 211 



SEQ.ID.NO.: 213 



SEQ.ID.NO.: 215 



SEQ.ID.NO.: 196 



SHQ.IO NO.; 270 



8EQ.1D.no.: 272 



SEQ.ID.NO.: 274 



SFQ. FD.NO.: 276 



SEQ.ID.NO.: 198 



SEQ.ID.NO.: 200 



i SEQ.ID.NO.: 20: 



SEQ.ID.NO.: 204 



SEQ.ID.NO.: :0) 



SEQ ID.NO.: 20S 



SEQ.ID.NO.: 211) 



SEQ.ID.NO.: 212 



SEQ.U:).KO.:214 



SEQ.ID.NO.: 21(. 



SEQ.ID.NO.: 217 



SEQ.ID.NO.: 218 



SEQrD.NO.;219 



I SEQ ID.NO.: 220 



SEQ.ID.NO.: 277 



SEQ.ID.NO.: 221 



SEQ.ID.NO.: 223 



Serotonin SHT^^ 
(C322KI 



Serotonin SHT^i 
(S310K:t 



V28 
([230K) 



SEQ.ID.NO.: 225 



SEQ.ID.NO.: 278 



SEQ ID NC: 222 



SEQ.ID.NO.: 224 



SEQ.ID.NO.: 226 



SEQ.ID.NO.: 227 



\ SEQ.ID.NO.: 228 



SEQ.ID.no.: 229 



SEQ.ID.NO.: 231 



SEQ.ID.no.: 230 



SEQJaN0.:232 
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2. Alternate Mutation Approaches for Employment of the Proline Marker 
JO .AJgorithm: APJ; Serotonin SHTj^; Serotonin 5HT„ ; and GPR30 

Although ihe above site-directed mutagenesis approach is particularly preferred, other 
approaches can be utilized to create such mutations; those skilled in the art arc readily credited 
15 5 with selecting approaches to mutating a GPCR that fits wilhm the particular needs of the artisan. 

a, APJ 

Preparation of the non-endogenous, human APJ receptor was accomplished by 
mutatmg L247K. Two oligonucleotides containing this mutation were synthesized: 
5'- GGCTTAAGAGCATCATCGTGGTGCTGGT(^3' (SHQ-in NO.: 233 ) 
10 5^-GTCACCACCA(K:A(:CACGATGATGCTC1TA.\GCC-3' (SEQ.ID.no.: 234) 

The two oligonucleotides were annealed and used to replace the Kacl-BstFIT fragment of human, 
endogenous APJ to generate the non- endogenous, version of human APJ. 
30 b. Serotonin 5HT2A 

cDNA containing the point mutation C322K was constructed by utilizmg the restriction 
1 5 enzyme site Sph 1 which encompasses amino acid 322. A pnmer containing the C322K mutation : 
5^-CAAAGAAAGTACTGGGCATCGTCTTCTTCCT-3' (SEQJD.NO; 235) 
was used along with the primer from the 3' untranslated region of the receptor: 
S^TGCTCTAGATTCCAGATAGGTGAAAA CTT(7-3^ (SEO.ID.NO.: 236) 
to perform PGR (under the conditions described above). Tlie resulting PGR fragment was then 
20 used to replace the 3' end of endogenous SHT^^ cDNA ttirough the T4 polvraerase blunted Sph 
I site. 

c. Serotonin 5HT2c 
The cDNA contaimng a S3 1 OK mutation was constructed by replacing the Sty I restriction 
fragment coiiiaining amino acid 3 1 0 with synthetic double stranded oligonucleotides that encode 
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the desired mutation. Tlie sense strand sequence utilized had the following sequence: 

5'-Cl'AGC5GGCACCATGCAGGCTATCA^\CA.ATGAyV'\GAAAAGCTA^\GAAAGTC-3' 
(SEQ. ID.NO.: 237) 

and the anti sense strand sequence utilized had the following sequence: 
5^C.\AGGACTTTaTAGCTTTTCTTTCA'nGTTGATAGCCTG^ 
rD.NO.;238) 

(L GPR30 

Prior to generating non-endogenous c:"jPR30, several independent pCR2. 1 /GPR30 i sol atcs 
were sequenced in their entirety in order to identify clones with no PGR -generated mutations. A 
clone having no mutations was digested with EcoR 1 and tlie endogenous GPR30 cDN A fragment 
was transferred into the CNfV -driven expression plasm id pCT-neo (Promega), by digestmg pCI- 
Nco with EcoRI and subcloning the EcoRI-hberated GPR30 fragment from pC:R2.1/GPR30, lo 
generate pGI/GPR30. Thereafter, theleucineatcodon258 was mutated to a lysine using a Quick- 
Change"^" Site-Directed Mutagenesis Kit (Slratagenc, #200518), according to manufacturer's 
instructions, and the following primers: 

5^CGGCGGCAGAAGGCGAAACGCATGATCCTCG€GGT-3' (SEQ.ID.NO.: 239) and 
5'-ACCGCGAGGATCATGCGTTTCGCCTTCTGCCGCCG-3^ (SEQ.ID.NO.: 240) 
Example 3 

Receptor (Endogenous and Mutated) Expression 

Although a variety of cells are available to the art for the expression of proteins, it is most 
preferred that mammalian cells be utilized The primary reason for this is predicated upon 
practicalities, i.e., utilization of, e.g., yeast cells for the expression of a GPCR, while possible. 
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introduces into the protocol a non-maminalian ceil which may not (indeed, in the case of 
yeast, does not) include the receptor-coupiing, genetic-mechanism and secretary pathways that 
have evolved for mammalian systems - thus, results obtained in non-mammalian cells, while 
of potential use, arc not as preferred as that obtained from mammalian cells. Of the 
mammalian cells, COS-7, 293 and 293T cells are particularly preferred, although the specific 
mammalian cell utili/.ed can be predicated upon the particular needs of the artisan. 

Unless otherwise noted herein, the following protocol was utilized for the expression 
of the endogenous and non-endogenous human GPCRs. fable D lists the mammalian eel 1 and 
number utilized (per 150mm plate) for GPCR expression. 

Table D 



Receptor iName 


1 Mammalian Cell 

i 


(Endogenous or Non- 


{Number Utilized) 


Endngenous) 




GPR17 


293 (2x10-^) 1 


GPR30 


293 (4x 10") 


APJ 


COS-7 (5X!0n 


1 ETBR-LP2 

1 


293 (1 X 10') 
2937(1x 10') i 


GHSR 
1 MIG 


293 (1 X 10') 
293T(1 X 10') 




293 (1 X 10-) 


Serotomn SHT^a 


2931(1x 10') : 


Serotonm 5H1\, 


293T(1 X 10') 1 



On day one, mammalian cells were plated out. On day two, two reaction tubes were 
prepared (the proportions to follow for each ujbe are per plate): tube A was prepared by mixing 
20|ig DNA {e.g., pCMV vector; pCMV vector with endogenous receptor cDNA, and pCMV 
vector with non- endogenous receptor cDXA.) in 1.2mi scrum free DMEM (Irvine Scientitlc, 
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In/me. CA): tube B was prepared by mixing 120^1 lipofectaminc (Gibco BRL) in 1 .2ml scrum 
free DMEM Tubes A and B were then admixed by inversions (several times), followed by 
incubation at room temperature for 30-45niin. The admixture is referred to as tl^e "transfection 
mixture". Plated cells were washed with IXPBS, followed by addition of lOml serum free 
5 Di\4EM. 2.4ml of the transfection mixmre was then added to the cells, followed by incubation 
for 4hrs at 37°C/5% CO^. The transfection mixture was then removed by aspiration, followed by 
the addition of 25ml of DMEM/l 0% Fetal Bovine Serum. Cells were then incubated at 37°C/5% 
CO.. After 72hr mcubation, cells were then haivcstcd and utilized for analysis. 
1. Gi-Coupled Receptors: Co-Transfection with Gs-Coupled Receptors 
10 In the case of GPR30, it has been determined that this receptor couples the Ci protein Gi. 

Gi is known to inhibit the enzyme adcnylyl cyclase, which is necessary for catal>^mg the 
conversion of ATP to cAMP. llius, a non-endogenous, constitutively activated form of GPR30 
would be expected to be associated with decreased levels of cAMP. Assay confirmation of a non- 
endogenous, constitutively activated form of GPR30 direct ly via measurement of decreasing levels 
5 of c AMP. while viable, can be preferably measured by cooperative use of a Gs-coupled receptor. 
For example, a receptor that is Gs-coupled will stimulate adcnylyl cyclase, imd thus will be 
associated with an increase in cAMP. The assignee of the present application has discovered that 
the orphan receptor GPR6 is an endogenous, constitutively activated GPCR. GPR6 couples to the 
Gs protem. Thus when co-transfected, one can readily verify that a putative GPR30-mutation 
0 leads to constitutive activation thereof i.e., an endogenous, conshtutivcly activated 
GPR6/endogenous, non-constitutively activated GPR30 cell will evidence an elevated level of 
c.\MP when compared with an endogenous, constitutively active GPR6/non-cndogenous, 
constitutively activated GPR30 (the laner evidencing a comparatively lower level of cAMP). 
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Asszys that detect cAM? can be utilized to determine if a candidate compound is e.g., an inverse 
agomst to a Gs-associated receptor such a compound would decrease the levels of cAMP) or 
a Gi-associated receptor (or a Go-associated receptor) (i.e., such a candidate compound would 
mcrease the levels of cAMP). A variety of approaches known m the art for measunne cAMP can 
5 be utilized; a preferred approach relies upon the use of anti-c AMP antibodies. Another approach, 
and most preferred, utilizes a whole cell second mesiiengcr reporter system assay. Promoters on 
20 gt^nes drive the expression of the proteins that a particular gene encodes. Cyclic AMP dnves gene 

expression by promoting the binding of a cAMP-rcsponsi ve UNA bindmg protein or transcnption 
factor (CI^B) which then binds to the promoter at specific sites called cAMP response elements 
1 0 and dnves the expression of the gene. Reporter systems can be constructed which have a promoter 
containing multiple cAMP response elements before the reporter gene, e.g., p-galactosidase or 
luciferase. Thus, an activated receptor such as GPR6 causes the accumulation of cANIP which then 
activates the gene and expression of the reporter protein. Most preferably, 293 cells arc co- 
transfcctcd witli GPR6 (or another Gs-linked receptor) and GPR30 (or another Gi-lmked receptor) 
35 piasmids, preferably in a 1:1 ratio, most preferably in a 1:4 ratio. Because GPR6 is an 

endogenous, constitutively active receptor that stimulates the product! on of c AMP, GPR6 strongly 
activates the reporter gene and its expression. The reporter paitein such as P-galactosidase or 
luciferase can then be detected using standard biochemical assays (Chen el al. 1995). Co- 
transfcction of endogenous, constimti vely active GPR6 with endogenous, non-constituti vely active 
20 GPR30 evidences an increase in the luciferase reporter protein. Conversely, co-transfection of 
endogenous, constitutively active GPR6 with non-endogenous, constitutively active GPR30 
evidences a drastic decrease in expression of luciferase. Several reporter plasmids are known and 
50 available in the art for measuring a second messenger assay. It is considered well within the 
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skilled artisan to determine an appropriate reporter plasmid for a particular gene expression based 
pnm;irily upon the particular need of the artisan. Although a variety of cells are available for 
expression, mammalian cells are most preferred, and of these types, 293 cells are most preferred. 
293 cells were transfccted with the reporter plasmid pCRB-Luc/CiPR6 and non- endogenous, 
5 constitutively activated GPR30 using a Mammalian Transaction™ Kit fStratagcne, //200285) 
CaPOj precipitation protocol according to the manufacturer's mstmctions (see, 28 Genomics 347 
(1 995) for the published endogenous GPR6 sequence). The precipitate coniained 400ng reporter, 
80ng CMV-expression plasmid (having a 1 :4 GPR6 to endogenous GPR30 or non-endogenous 
GPR30 ratio) and 20ng CMV-SEAP (a tr;insfection control plasmid encoding secreted alkaline 
10 phosphatase). 50% of the precipitate was split mto 3 wells of a 96-wclt tissue culture dish 
(contammg 4X10** cells/well); the remaining 50% was discarded. The following morning, the 
media was changed. 48 hr after the start of the traiisfection, cells were lysed and examined for 
luciferase activity using a Luchte™ Kit (Packard, Cat. P 601 691 1 ) and Tnlux 1450 Microbeta^f^^ 
liquid scintillation and luminescence counter (Wallac) as per the vendor's insmictions. The data 
5 were analyzed using GraphPad Prism 2.0a (GraphPad Software Inc.). 

With respect to GPR17, which has also been determined to be Gi-linked. a modification 
of the foregomg approach was utilized, based upon, mer alia, use of another Gs-linkcd 
endogenous receptor, GPR3 (see 23 Genomics 609 (1994) and 24 Genomics 391 (1994)). Most 
preferably, 293 cells are utilized. These cells were plated-out on 96 well plates at a density of 2 
0 X 1 0^ cells per well and were transfected usmg Lipofectamme Reagent (RRL) the following day 
according to manufacturer instmctions. A DNA/lipid mixture was prepared for each 6-well 
transfection as follows: 260ng of plasmid DNA in lOO^il of DMEM were gently mixed with 2pl 
of lipid m 100^1 of DMEM (tiic 260ng of plasmid DNA consisted of 20(:fng of a 8xCRE-Luc 
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reporter plasmi d (see below), 50ng of pCMV compnsmg endogenous receptor or non-endogenous 
receptor or pCM\^ alone, and lOng of a GPRS expression plasmid (GPRS in pcDNA3 
(Invitrogen)). The 8XCRE-Luc reporter pi asm id was prepared as follows: vector SRIF^p-jzal was 
obtained by cloning the rat somatostatin promoter (-7i/+5I) at BglV-Hindlll site in the ppgal- 
5 Basic Vector ( Clontech). Eight (8) copies of c AMP resp>onsc clement were obtained by PCR from 
an adenovirus template AdpCFl 26CCRE8 {see 7 Human Gene Therapy 1 883 ( 1 996)) ;ind cloned 
20 into the SRfF-f^gal vector at the Kpn-Bgl V site, resulting in the SxCRE-^-gal reporter vector. The 

8xCRE-Luc reporter plasmid was generated by repiacmg the beia-galactosidase gene in the 
8xCRE-[i-gal reporter vector with the luciferase gene obtained from the pGL3-basic vector 
10 (Promcga) at the Hindill-BamHl site. Following 30min. incubation at room temperature, the 
DNA/lipid mixture was diluted with 400 ul ofDMEM and I O0|al ofthe diluted mixture was added 
to each well. 1 00 ^il ofDMEM with 10% PCS were added to each well after a 4hr incubation in 
a cell cuimre incubator. The next morning the transfected cells were clianged witli 200 pywell of 
DMEM with 10% PCS. Eight (8) hours later, the wells were chinged to 100 ul /well of DMEM 
35 1 5 without phenol red after one wash with PBS. Luciferasc activity were measured the next day 

using theLucLite^^ reporter gene assay kit (Packard) following manufacturer mstruclions and read 
on a 1450 MicroBeta"^"^ scintillaiion and lummescence counter (Wallac). 

Figure 4 evidences that constitutively active GPR30 inhibits GPR6-mediated 
activation of CRE-Luc reporter in 293 cells. Luciferase was measured at about 4.1 relative 
20 light units in the expression vectorpCMV. Endogenous GPR30 expressed luciferase at about 
8.5 relative light units, whereas the non-endogenous, constitutively active GPR30 (L258K), 
expressed luciferase at about 3.8 and 3. 1 relative light units, respectively. Co-transfection of 
50 endogenous GPR6 with endogenous GPR30, at a 1:4 ratio, drastically increased luciferase 



30 



40 



45 



55 



wo 00/22129 



PrT/US99/23938 



- 62 - 

expression to about 104.1 relative light units. Co-transfection of endogenous GPR6 with non- 
endogenous GPR30 (L258K), at the same ratio, drastically decreased the expression, which 
is evident at about 18.2 and 29.5 relative light units, respectively. Similar results were 
observed with respect to GPR17 with respect to co-trans fee tion with GPR3, as set fonh in 
Figure 5. 
Example 3 

Assays For determination of CoNSTiTtTiVE A( i ivity 
OF Non-Endogfnoi'sGPCRs 

A Membrane Binding Assays 

1. ['"SIGTPyS Assay 

When a G protein-coupled receptor is in its active state, cither as a result of ligand binding 
or constitutive activation, the receptor couples to a G protein and stimulates the release of GDP 
and subsequent binding of GIT to the G protein. The alpha subunit of the G protein-receptor 
complex acts as a GTPase and slowly hydroly^es the GTP to GDP, at which point the receptor 
nonnaliy is deactivated. Constitutively activated receptors continue to exchange GDP for GTP. 
The non-hydrolyzable GTP analog, p'SJCTPyS, can be utilized to demonstrate enhanced binding 
of [^^SJGTTyS to membranes expressing constitutiveiy activated receptors. The advantage of 
using ['''^SjGTPyS binding to measure constitutive activation is that; (a) it is genericaily applicable 
to all G protein-coupled receptors; (b) it is proximal at the membrane surface makijig it less likely 
to pick-up molecules which affect the intracellular cascade. 

The assay utilizes the ability of G protein coupled receptors to stimulate ['^S]GTi^/S 
binding to membranes expressing the relevant receptors. The assay can, therefore, be used in 
the direct identification method to screen candidate compounds to known, orphan and 
constitutively activated G protein-coupled receptors. The assay is generic and has application 
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to drug disco ver> at all G protein-coupled receptors. 

Tlie f''S]GTPyS assay was incubated m 20 mM HEPES and between 1 and about 20mM MgCl, 
(this amount can be adjusted for optimization of results, althougli 20mM is preferred) pH 7.4, 
binding buffer with between about 0.3 and about 1 .2 nM [^^S]GTP/S (this amount can be adjusted 
5 for optimization of results, although 1 .2 is preferred ) and 12.5 to 75 membrane protein {e.g. 
COS-7 cells expressing the receptor, this amount can be adjusted for optimization, aMiough 75|ig 
is preferred) and 1 pM GDP (this amount can be changed for optimization) for 1 hour. 
Wlicatgemi agglutinin beads (25 |j1; Amcrsham) were then added and the mixmre was incubated 
for another 30 minutes at room tempcranire. The tubes were then centnfuged at 1500 x g for 5 
0 minutes at room temperature and then counted in a scintillation counter. 

A less costly but equally applicable alternative has been identified which also meets the 
needs of large scale screening. Flash plates™ and Wallac™ scintistiips may be utilized to format 
a high throughput [-'SJGTPyS binding assay. Furthermore, using this teclinique, the assay can be 
utilized for knowTi GPCRs to smiultaneously monitor tntiated ligand binding to the receptor at the 
5 same time as monitonng the efficacy via [- SlGTPyS binding. This is possible because the Wallac 
beta counter can switch energy windows to look at both tritium and '^S-laheied probes. This assay 
may also be used to detect other types of membrane activation events resultmg in receptor 
activation. For example, the assay may be used to monitor "'P phosphorylation of a vancty of 
receptors (both G protein coupled and tyrosine kinase receptors). W^cn the membranes are 
3 centrifuged to the bottom of the well, the bound ['^S]GTPyS or the ^'P-phosphorylatcd receptor 
will activate the scintillant which is coated of the wells. Scinti** strips (Wallac) have been used to 
demonstrate this principle. In addition, the assay also has utility for measuring ligand bmding to 
receptors usmg radioactively labeled hgands. In a similar manner, when the radiolabeled bound 
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ligand is centnfugcd to the bottom of the well, the scintistrip label comes into proximity with the 
radiolabeled ligand resulting in activation and detection. 

Representative results of graph comparing Control (pCM V). Endogenous APJ and Non- 
Endogenous APJ, based upon tlic foregoing protocol, are set forth in Figure 6. 
5 2. Adenylyl Cyclase 

A Flash Platc"^" Adenylyl Cyclase kit (New England Nuclear; Cat. No. SMP004A) 
20 designed for cell-based assays was modified for use with crude plasma membranes. The I' lash 

Plate wells contain a scindllant coating which also contains a specific antibody recognizing c AMP. 
The cAMP generated in the wells was quantitatcd by a direct competition for bmding of 
10 radioactive cAMP tracer to the cAMP antibody. The following ser\'es as a brief protocol for the 
measurement of changes in cAMP levels in membranes tliat express the receptors. 

Transfecied cells were harvested approximately three days alter trans faction. Membranes 
were prepared by homogenization of suspended cells in butTer containing 20niM HEPES, pH 7.4 
and lOniM MgCl2. Homogenization was perfomied on ice using a Bnnkman Polytron^"^ for 
35 15 approximately 10 seconds. The resulting homogcnate w;is centnfliged at 49,000 X g for 15 

minutes at 4"C. The resulting pellet was then resuspended m bufler containing 20mM HEPES, 
pH 7.4 and 0. 1 mM EDTA, homogenized for 10 seconds, followed by centrifugation at 49,000 X 
g for 15 minutes at 4^C. The resulting pellet can be stored at -80^C until utilized. On the day of 
measurement, the membrane pellet was slowly thawed at room temperature, resuspended in buffer 
20 containing 20mM HEPES, pi 1 7.4 and 1 OmM MgCI^ (these amounts can be optimized, although 
the values listed herein are prefereed), to yield a fmal protein concentration of 0.60mg/ml (the 
resuspended membranes were placed on ice until use). 
50 cAMP standards and Detection Buffer (comprismg 2 ^iCi of tracer ['''I cAMP ( 1 00 ^1] to 
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1 i ml Detection Buffer) were prepared and maintained in accordance with the manufacturer's 
instructions. Assay Buffer was prepared fresh for screening and conUimcd 20mM I lEPES, pH 7.4, 
lOmM MgCl^, 2()mM (Sigma), 0.1 units/ml creaime phosphokinase (Sigma), 50 pM GTP 
(Sigma), and 0.2 mM ATP (Sigma); Assay Buffer c;ui be stored on ice until utilized. ITie assay 
5 was imtiated by addition of 50ui of assay buffer followed by addition of 50u] of membrane 
suspension to the NEN Flash Plate. The resultant assay mixmre is mcubated for 60 minutes at 
20 room temperature followed by addition of 1 OOul of detection buffer. Piates are then incubated an 

additional 2-4 hours followed by countmg in a Wallac MicroBela scintillation counter. Values of 
cAMP/well are extrapolated from a stand^ird C.A.V1P cun'c which is contained within each assay 
1 0 plate. The foregoing assay was utilized with respect to analysis of MJG. 
B. Reporter-Based yVssays 

1 . CREB Reporter Assay (Gs-associated receptors) 
A method to detect Gs stimulation depends on the known property of the transcription 
factor CREB. which is activated in a c/\MP-dcpendent manner. A PathDetect CREB trans- 
35 '5 Reporting System (Stratagene, Catalogue #219010) was utilized to assay for Gs coupled 

activity in 293 or 293T cells. Cells were transfcctcd with the plasmids components of this 
above system and the indicated expression plasmid encoding endogenous or mutant receptor 
using a Mammaliaji Transfection Kit (Stratagene, Catalogue ^200285) according to the 
manufacurer's instructions. Briefly, 400 ng pFR-Luc (luciferase rcponer piasmid containing 
20 Ga]4 recognition sequences), 40 ng pFA2-CREB (Gal4 CREB fusion protein containing the 
Ga]4 DNA-binding domain), 80 ng CMV-rcceptor expression plasrnid (comprising the 
receptor) and 20 ng CMV-SEAP (secreted alkaline phosphatase expression plasmid; alkaline 
phosphatase activity is measured in the media of transfected cells to control for variations m 
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transfection efficiency between samples) were combined m a calcium phosphate precipitate 
as per the Kit's mstructions. Half of the precipitate was equally distributed over 3 wells in a 
96-wcIl plate, kept on the cells overnight, and replaced with fresh medutm the following 
morning- Forty-eight (48) hr after the start of the transfcction, cells were treated and assayed 
for luciferasc activity as set fonh with resepct to the GPR30 system, above. This assay was 
used with respect to GHSR. 

2. API reporter assay (Gq-associafed receptors) 
Ac method to detect Gq stimulation depends on the known property of Gq-dependent 
phosphohpasc C to cause the activation of genes containing AP 1 elements in their promoter. 
A Pathdetect AP-1 cis-Reponing System (Stratagene, Catalogue 219073) was utilized 
following the protocl set forth above with respect to the CREB reporter assay, except that the 
components of the calcium phosphate precipitate were 410 ng pAPl-Luc, 80 ng receptor 
expression plasmid, and 20 ng CM V-SEAP. This assay was used with respect to HTBR-LP2 
C. Intracellular IP3 Accumulation Assay 

On day K cells compnsing the serotonin receptors (endogenous and mutated) were 
plated onto 24 well plates, usually 1x10^ cells/well. On day 2 cells were transfected by firstly 
mixing 0.25ug DNA in 50 ul serumfree DMEM/well and 2 ul lipofectamine in 50 ^il 
serumfree DMEM/well. I he solutions were gently mixed and incubated for 15-30 min at 
room temperature. Cells were washed with 0.5 ml PBS and 400 ^1 of serum free media was 
mixed with the transfcction media and added to the cells. The cells were then incubated for 
3-4 hrs at 37°C/5%C02 and then the transfection media was removed and replaced with 
Imiyweli of regular growth media. On day 3 the cells were labeled with ^li-myo-mositol. 
Bnefly, the media was removed the cells were washed with 0.5 ml PBS. Then 0.5 ml inositol- 
free/serumfree media ( GIBCO BRL) was added/well with 0.25 fiCi of 'H-myo-inosilol / well 
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and the cells were incubated for 16-18 hrs o/n at 37X/5%CO, . On Day 4 the cells were 
washed wnh 0.5 ml PBS and 0.45 ml of assay medium was added containing mositol- 
free/serum free media 10 pargylinc 10 mM lithium chlonde or 0.4 ml of assay medium 
and 50 ul of lOx ketanserin (ket) to final concentration of 10|iM. The cells were then 
5 incubated for 30 min at 37X. The cells were then washed with 0.5 ml PBSand 200 ul of 
fresh/icecold stop solution (IM KOH; 18 mM Na-borate; 3.8 mM EDTA) was added/well. 
^0 The solution was kept on ice for 5- 1 0 min or until cells were lysed and then neutralized by 

200 1^1 of fresh./icc cold neutralization sol. (7.5 % IICL). The lysate was then transferred mto 
1 .5 ml eppendorf tubes and 1 ml of chlorofomVmethanol ( 1 :2) was added/tube. The solution 
10 was vortexed for 15 sec and the upper phase was applied to a Biorad AGI-X8 anion 
exchange resin ( 1 00-200 mesh). Firstly, the resin was washed with water at 1 ; 1 .25 WA/ and 
0.9 ml of upper phase was loaded onto the column. The column was washed with 10 mis of 
5 mM myo-inositol and 10 ml of 5 mM Na-boratc/60mM Na-formate. The mositol tris 
phosphates were eluted into scintillation viais containing 10 ml of scintillation cocktail with 
35 15 2 ml of 0.1 M formic acid/ 1 M ammonium formate. The columns were regenerated by 

washing with 10 ml of 0.1 M formic acid/3M ammonium formate and nnsed twice with dd 
HjO and stored at 4°C in water. 

Figure 7 provides an illustration of IP3 production from the human S-HTj^ receptor 
that incorporates the C322K mutation. While these results evidence that the Proline Mutation 
20 Algorithm approach constitutiveiy activates this receptor, for purposes of using such a 
receptor for screening for identification of potential therapeutics, a more robust difference 
would be preferred- However, because the activated receptor can be utilized for understanding 
50 and elucidating the role of constitutive activation and for the identification of compounds that 
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can be funher examined, we beJieve that this difference is itself useful m differentiating 
between the endogenous and non-endogenous versions of the human receptor. 
D. Result Summary 

1 he results for tlic GPCRs tested are set forth m Table E where the Per-Cent Increase 
indicates the percentage difference in results obscn^cd for the non-endogenous GPCR as compared 
to the endogenous GPCR; these values are followed by parenthetical indications as to the type of 
assay utilized. Additionally, the assay sytem utilized is parenthetically listed (and, m cases where 
different Host Cells were used, both arc listed). As these results indicate, a variety of assays can 
be utilized to determine constitutive activity of the non-endogenous versions of the human GPCRs. 
Those skilled in the art, based upon the foregoing and with reference to information available to 
the art, are crcditied with theability to selelect and/ot m:iximize a particular assay approach that 
suites the particualr needs of theinvestigator 



Table E 



Receptor Identifier 


Per-Cent Difference 

! 


(Cod on Mutation) 




GPR17 


74.5 


(V234K) 


(CRE-Luc) 


GPR30 


71.6 


(L258K) 


(CREB) 


APJ 


49.0 


(L247K) 


(GTPyS) 


ETBR-LP2 


48.4(APl-Luc-293) 


(N358K) 


61.1(AP1-Luc-293T) 



GHSR 


58.9(CREB - 293) j 


(V262K) 


35.6(CREB -293T) | 
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MIG 


39 (cAMP) 




(I230K) 






Serotonin SHTja 


33.2 (IP3) 





(C322K) 






Serotonin SHTjt 


39.UIP3) 



(S310K) ; ! 

Example 6 

Tissue Distribution of Endogenous Orphan GPCRs 

Using a commercially available human-tissue dot-blot format, endogenous orphan GPCRs 
were probed for a determination of the areas where such receptors are localized. Except as indicate 
below, the entire receptor cDNA (radio labelled) was used as the probe: radiolabeled probe was 
generated using the complete receptor cDNA (excised from the vector) usmg a Pnme-It IF" 
Random Primer Labeling Kit (Stratagene, //300385), according to mimufacturer's instructions. 
A human RNA Master Blot^'^ (Clontech, UlllO-X) was hybridized with the GPCR 
radiolabeled probe and washed under stringent conditions according manufacturer's 
instructions. The blot was exposed to Kodak BioMax Autoradiography film overnight at - 

Representative dot-blot format results are presented m Figure 8 for GPRl (8 A), GPR30 
(8B), and APJ (8C), with results being summarized for all receptors in Table F 

Table F 



GPCR 


Tissue DtstributioD 
(bigbest levels, relative to other tissues in 
the dot-blot) 


GPRl 


Placenta, Ovary, Adrenal 
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GPR4 


1 Broad; highest m Heart, Lung, Adrenal, 
Thyroid, Spinal Cord 


GPR5 


* Placenta, ThyTnus, Fetal Thymus 
Lesser levels in spleen, fetal spleen 


GPR7 


Liver, Spleen, Spinal Cord, Placenta 


GPRS 


No expression detected 


GPR9-6 


Thymus, Fetal Thymus 
Lesser levels in Small hiiestine 


! GPR18 


Spleen, Lymph Node, Fetal Spleen, Testis 


GPR20 


Broad 


GPR21 


Broad; vcr>' low abundance 


GPR22 


Heart, Fetal Heart 
Lesser levels in Brain 


GPR30 


Stomach 


I GPR31 


Broad 


BLRl 


Spleen 


CEPR 


Stomach, Liver, ThvToid, Putamen 


EBI] 


Pancreas 

Lesser levels in Lymphoid Tissues 


EBI2 


Lymphoid Tissues, Aorta, Lung, Spinal Cord 


ETBR-LP2 


Broad; Brain Tissue 


GPCR-CNS 


Brain 

Lesser levels in Testis, Placenta { 


GPR-NGA 


Pituitary i 
Lesser levels in Brain 


H9 


Pituitary 


HB954 


Aorta, Cerebellum 

Lesser levels in most other tissues 


HM74 

i 


Spleen, Leukocytes, Bone marrow, Mammary ^ 
Glands, Lung, lYachea 


MIG 


Low levels in Kidney, Liver, Pancreas, Lung, 
Spleen 


ORG I 


Pituitary, Stomach, Placenta 


V28 


Brain, Spleen, Peripheral Leukocytes j 



45 



50 



25 Based upon the foregoing information, it is noted that human GPCRs can also be assessed 

for distribution in diseased tissue; comparative assessments between "nonnar' and diseased tissue 
can then be utilized to determine the potential for over-expression or under- cxpn^sion of a 
particular receptor in a diseased state. In those circumstances where it is desirable to utalize the 
non-endogenous versions of the human GPCRs for the purpose of screening to directly identify 
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candidate compounds of potential therapeutic relevance, it is noted that inverse agonists are uselbl 
in the treatment of diseases and disorders where a particular human GPCR is over- ex pressed, 
whereas agonists or partial agonists are useful in the treatment of diseases and disorders where a 
particular human GPCR is under- expressed. 
5 As desired, more detailed, cellular localization of the rcccpotrs, using techniques well- 

known to those in Uic art {e.g., in-silu hybridization) can be utilized to identity particualr cells 
20 within these tissues where the receptor of interest is expressed. 

It is intended that each of the patents, applications, and pnntcd publications mentioned in 
this patent document be hereby incorporated by reference in their entirety. 
1 ^ As those skilled m the art will appreciate, numerous changes and modifications may be 

made to the preferred embodiments of the invention without departing from tlic spint of the 
invention. It is intended that ail such variations fall within the scope of the mvention. 

Althougli a variety of expression vectors arc available to those in the art, for purposes of 
utilization for both the endogenous and non-endogenous humiin GPCRs, it is most preferred that 
35 1 5 the vector utilized be pCM\\ This vector has been deposited with the American l^^pe Culture 
Collection (ATCC) on Ck;tober 13, 1998 (10801 Umversity Blvd., Manassas, VA 201 10-2209 
USA) under the provsions of the Budapest Treaty for the International Recognition of the Deposit 
of Microorganisms for the Purpose of patent Procedure. Tlie vector was tested by the ATCC on 
, 1 998 and determmed to be viable on ^ 1998. The ATCC has assigned 
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CLAIMS 

What is claimed is: 

1 . A constitutivcly active, non-endogenous version ol an endogenous human orphan G protein- 
coupled receptor (GPCR) comprising the following amino acid residues (carboxy- terminus to amino- 
terminus orientation) transvcrsingihetransmembrane-6('rM6) and intracellular loop-3 (1C3) regions 
of the non-endogenous GPCR: 

P' AA„ X 

wherein: 

(1) P' is an amino acid residue located within the TM6 region of the non 

endogenous GPCR, where P' is selected from the group consisting 
of (i) the endogenous orphan GPCR proline residue, and (ii) a non- 
endogenous amino acid residue other than proline; 

(2) A A, 5 are 1 5 amino acid residues selected from the group consisting 
of (a) the 15 endogenous ammo acid residues of tlie endogenous 
orphan GPCR, (b) 1 5 non-endogenous amino acid residues, and (c) 
a combination of 15 amino acid residues, the combination 
compnsing at least one endogenous amino acid residue of the 
endogenous orphan GPCR and at least one non-endogenous amino 
acid residue, excepting that none of the 1 5 endogenous amino acid 
residues that are positioned within the TM6 region of the GPCR is 
proline: and 

(2) X is a non-endogenous am ino acid residue located within the IC3 region 
of said non-endogenous GPCR. 

\. The non-endogenous human GPCR of claim 1 wherein P' is the endogenous prohne 



10 



15 



25 



30 



WO 00/22129 PCT/US99/23938 



- 73 ■ 

residue. 



3. The non-endogenous human GPCR of claim 1 wherein P' ib a non-endogenous amino 
acid residue other than a prohne residue. 

4. 1 he non-cndogcnous human GPCR of claim 1 wherein AA,5are the 1 5 endogenous 
5 amino acid residues of the endogenous GPCR. 

5. The non-endogenous human GPCR of claim 1 wherein X is selected from the group 
20 consisting of lysine, hisitidine. arganine and alanine residues, excepting that when the 

endogenous amino acid in position X of said endogenous human CjPCR is lysine. X 
is selected from the group consisting of histidine, argininc and alanine. 
1 0 6. The non-endogenous human GPCR of claim 1 wherein X is a lysine residue, excepting 
thai when the endogenous amino acid in position X of said endogenous human GPCR 
is lysine, X is an amino acid other than lysine. 

7. The non-endogenous human GPCR of claim 4 wherein X is a lysine residue, excepting 
that when the endogenous amino acid in position X of said endogenous human GPCR 

15 is lysine. X is an amino acid other than lysine. 

8. The non-endogenous, human GPCR of claim 1 wherein P ' is a proline residue and X 
is a lysine residue, excepting that when the endogenous amino acid in position X of 
said endogenous human GPCR is lysine, X is an amino acid other than lysine. 

9. A host cell comprising the non-endogenous human GPCR of claim 1 . 
20 1 0. The material of claim 9 wherein said host cell is of mammalian origin. 

45 

1 1 . llie non-endogenous human GPCR of claim 1 in a purified and isolated form. 

12. A nucleic acid sequence encoding a constitutively active, non-endogenous version of 
endogenous human orphan G protein-coupled receptor (GPCR) comprising the following 
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nucleic acid sequence region transvenjing the tTansmcmbrane-6 ( TM6) and intracellular ioop-3 

10 

(IC3) regions of the orphan GPCR: 

3'-P^- (AA-codon)„ X,^„„-5' 

wherein: 

15 

5 P*^^"is a nucleic acid encoding region within the rM6 region of the 

non-endogenous GPCR. where P"^^" encodes an amino acid selected 
20 from the group consisting of (i) the endogenous GPCR proline residue, 

and (ii) a non-endogenous amino acid residue other than proline; 
(2) (AA-codon),5arc 1 5 codons encoding 15 amnio acid residues selected 
from the group consisting of (a) the 15 endogenous amino acid 
residues of the endogenous orphan GPCR, (b) 1 5 non-endogenous 
amino acid residues, and (c) a combination of 1 5 amino acid residues, 
the combination comprising at least one endogenous amino acid 
residue of the endogenous orphan GPCR and at least one non- 
35 15 endogenous amino acid residue, excepting that none of the 15 

endogenous amino acid residues that are positioned within the TM6 
region of the orphan GPCR is prohne; and 
O) X,^„„ is a nucleic acid encoding region residue located within the 1C3 
region of said non-endogenous human GPCR, where encodes a 

20 non-endogenous amino acid. 

13. The nucleic acid sequence of claim 12 wherein P"^'^'^ encodes an endogenous proline 
residue. 

50 nucleic acid sequence of claim 12 wherein P*^"*°" encodes a non-endogenous 
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cunino acid residue other than a proline residue. 

15. The nucleic acid sequence of claim 12 wherein X^^„, encodes a non-endogenous 
amino acid selected from the group consisting of lysine, histidine, arginine and 
alanine, excepting that when the endogenous amino acid in position X of said 
endogenous human GPCR is lysine. X,^, encodes an amino acid selected from the 
group consisiting of histidine, arginine and alanine. 

16. The nucleic acid sequence of claim 13 wherein X^,^,,, encodes a non-endogenous 
lysine amino acid excepting that when the endogenous amino acid in position X ol' 
said endogenous human (jPCR is lysine, X,^j^„ encodes an amino acid selected from 
the group consisiting of histidine, arginine and alanine. 

17. The nucleic acid sequence of claim 12 wherein X,,^^, is selected from the group 
consisting of AAA. AAG, GCA, GCG, GCC and GCU. 

18. The nucleic acid sequence of claim 12 wherein X,,j^„ is selected from the group 
consisting of AAA and AAG. 

19. The nucleic acid sequence of claim 12 wherein P"'^^" is selected from the group 
consisting of CCA. CCC, CCG and CCU, and X^,^„„ is selected from the group 
consisting of AAA and AAG. 

20- A vector comprising the nucleic acid sequence of claim 12. 

21. A plasmid comprising the nucleic acid sequence of claim 12. 

22. A host cell comprising the nucleic acid sequence of claim 21 . 

23. The nucleic acid .sequence of claim 12 in a purified and isolated form. 

24. A method for selecting for alteration an endogenous amino acid residue within the 
third intracellular loop of a human G protein-coupled receptor ("GPCR"), said receptor 
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comprising a transmembrane 6 region and an intracellular loop 3 region, which endogenous 
amino acid, when altered to a non-endogenous amino acid, constitutively activates said human 
GPCR. comprising the following steps: 

(a) identifying an endogenous proline residue within the transmembrane 6 region 
of a human GPCR; 

(b) identifying, by moving in a direction of the carboxy-terminus region of said 
GPCR towards the amino-terminus region of said (jPCR. the endogenous, 1 6'^ 
amino acid residue from said proline residue; 

(c) altering the endogenous residue of step (b) to a non-endogenous amino acid 
residue to create a non-endogenous version of an endogenous human GPCR; 
and 

(d) determining whether the non-endogenous humaji GPCR of step (c) is 
constitutively active. 

25. The method of claim 24 wherein the amino acid residue that is two residues from said 
proline residue in the transmembrane 6 region, in a ciirboxy-terminus to amino- 
terminus direction, is tr>^ptophan. 

26. A constitutively active, non-endogenous human GPCR produced by the process of 
claim 24. 

27. A constitutively active, non-endogenous human GPCR produced by the process of 
claim 25. 

28. An algorithmic approach for creating a non-endogenous, constitutively active version 
of an endogenous human G protein coupled receptor (GPCR), said endogenous GPCR 
comprising a transmembrane 6 region and iin intracellular loop 3 region, the 
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algorithmic approach comprising the steps of: 

(a) selecting an endogenous human GPCR comprising a proline residue in the 
traiismembrane-6 region; 

(b) identifying, by counting 16 amino acid residues from the proline residue of 
5 step (a), in a carboxy-terminus to ami no -terminus direction, an endogenous 

amino acid residue; 

20 (c) altering the identified amino acid residue of step (b) to a non-endogenous 

amino acid residue to create a non-endogenous version of the endogenous 
human GPCR; and 

-'5 

JO (d) determining if the non-endogenous version of the endogenous human (]PCR 

of step (c) is constitutive I y active. 
29. The algorithmic approach of claim 28 wherein the amino acid residue that is two 

30 

residues from said proline residue in the Iransmcmbrar^e 6 region, in a carboxy- 
terminus to amino-terminus direction, is tryptophan. 
35 15 30. A constitutively active, non-endogenous human GPCR produced by the algorithmic 

approach of claim 28. 

31. A constitutively active, non-endogenous human GPCR produced by the algorithmic 
approach of claim 29. 

32. A method for directly identifying a compound selected from the group consisting of 
20 inverse agonists, agonists and partial agonists to a non-endogenous, constitutively 

45 

activated human G protein coupled receptor, said receptor comprising a 
transmembrane-6 region and an intracelluhu" loop-3 region, comprising the steps of 
(a) selecting an endogenous human GPCR; 
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(b) identifying a proline residue within the transmembrane-6 region of the GPCR 
of step (a); 

(c) identifying, in a carboxy- terminus to amino-terminus direction, the 
endogenous, 16'^ amino acid residue from the proline residue of step (b); 

(d ) ahering the endogenous amino acid of step (c) to a non-endogenous amino 
acid; 

(c) confirming that the non-endogenous GPCR of step (d) is constitutivcly active; 
(i") contacting a candidate compound with the non-endogenous, constitutivcly- 

activated GPCR of step (e); and 
(g) dcterminmg, by measurement of the compound efficacy at said contacted 

receptor, whether said compound is an inverse agonist, agonist or partial 

agonist of said receptor. 



33. The method of claim 32 wherein the non-endogenous amino acid of step (d) is lysine. 

34. A compound directly identified by the method of claim 32. 

35. The method of claim 32 wherein the directly identified compound is an inverse 
agonist. 

36. The method of claim 32 wherein the directly identified compound is an agonist. - 

37. The method of claim 32 wherein the directly identified compound is a partial agonist. 

38. A composition comprising the inverse agonist of claim 35. 

39. A composition comprising the agonist of claim 36. 

40. A composition comprising the partial agonist of claim 37. 

41. A method for directly identify-ing an inverse agonist to a non-endogenous. 
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constitutiveJy activated human G protein coupled receptor ("GPCR"). said GPCR comprising 
a transmcmbrane-6 region and an intraceiluiar loop-3 region, comprising the steps of: 

(a) selecting an endogenous human GPCR; 

(b) identifying a proline residue within the transmembrane- 6 region of the GPCR of 
step (a); 

(c) identifying, in a carboxy-terminus to amino-terminus direction, the 
endogenous. 16'^ amino acid residue fi-om the proline residue of step (b): 

(d) altering the endogenous amino acid of step (c) to a non-endogenous lysine residue; 

(e) confirming that the non-endogenous GPCR of step (d) is constitutively active; 
(0 contacting a candidate compound with the non-endogcnoiLS, constilutively- 

aciivated GPCR of step (e); and 
(g) detcnnining, by measurement of the compound efficacy at said contacted receptor, 
whether said compound is an inverse agonist of said receptor. 

42. An inverse agonist directly identified by the method of claim 37. 

43. A composition comprising an inverse agonist of claim 38. 
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pCMV Sequence and Restriction Site 



BcoR V 



JHind 1)1 



£cx>R I 



PstI 
Ava I 
Nctl 
i Neil 
i Sma I 

BamH I Spe ! Xba I 



Bsr8 I 
Not I 
J-iael 



Sac 11 

BstX I .Sac I 



AAGCTTGATATCCAAT7CCTGCAGCCCCGGGGATCCACTAGTTCTAGACCCGCCGCCACCCCGGTGGAGCTCCAGCTTTT 

I 1 1 ' 1 ' ' ' ^ * • ■ * ' ' ^ 

TTCGAACTATAGCT7AAGGACG7CGCGCCCCCTAGGTGATCAAGATCTCGCCGCCCGTGGCGCCACCTCGAGGTCGAAAA 



^ 80 



KLDIEFL0PGGSTSSRAAA7AVELQLL 
SL ISNSCSPGDPLVLERPPPRWSSSF 
QA. YRIPAARGIH. F. SCRHRGGAPAF 

, , _H ^ 1 1 1 ^— H ^ H > H ^ i > h 

LSSISNRCGPPDVLELAAAVA7SSWSK 
LiCrDFEOLGPSGSTRSRGGGRHLELKO 
AQYRJGAARPIW.N. LPRWRPPAGAK 



BssH II 

i 

G77CCC7T7AG7GAGGG7TAA7TGCCCGC7AGAGGA7C77TC7GAAGGAACCTTAC77C7G7GGTGTGACATAATTCCAC 



-+ 160 



CAAGCGAAA7CAC7CCCAA77AACGCGCGATC7CC7AGAAACACTTCC77GCAA7CAAGACACCACAC767A77AACCTG 



FPLVRVNCALEDLCEC7LLLWCDi IG 
CSL. . GLIAR. RIFVKEP7FCGV7. LD 
VPFSEG. LRARGSL. RNL7SVV. HNW7 

^ 1 , i 1 1 1 1 i \ 1 ! p 1 1 h 

NGKTL7LQASSSRQSPVKSRHHSMI PC 
ER. HPNIAR. LIIC7FSG. KQP7VYNS 
7GKLSP. NRALP0KHLFRVE77HCLQV 



Pral 

AAAC7ACC7ACAGAGA777AAAGC7C7AAGG7AAATATAAAATT7T7AAG7G7A7AATGTG77AAACTAC7GATTC7AA7 

, 1 ^ j , 1 I i \ ! ' ( I ' i- 

77TGATGGA7G7CTCTAAATT7CGAGATTCCA7T7A7ATTT7AAAAATTCACATAT7ACACAAT77GA7CAC7AAGA77A 



240 



Q77YRDLICL. GKYKIFK, C]MC. 7TDSN 
KLP7EI. SSKVNIKFLSV. CVKLLILI 
NYLQRFKALR. I.NF. VYNVLNY. F. 



■4- 



VV. LSKFS. PLYtlKLHIIH. VVSEL 
LSCVSI. LELTFIFNKLTYHTLSSIRI 
F. RCLNLARLYIYFIC. 7YL7NF. ON. N 



7GT77G7G7A7TT7ACA77CCAACCTA7GCAACTGATCAA7GGCAGCAC7CG7CGAA7CCCTT7AA7GAGGAAAACCTG7 



H- 320 



ACAAACACATAAAA7C7AAGCT7GGA7ACCT7GAC7ACT7ACCC7CG7CACCACCT7ACGGAAATTACTCCTTT7GGACA 



CLCILDSNLWN. . MGAVVECL. . CKPV 
VCVF. IP7YGTDEVEQWVNAFNEENL 
LFVYFRFQPMELttNCSSGCMPLHRKTC 



-+- 



QKHIKSELRHF0HIPAT7SHR. HPFe7 
7Q7N. 16V. PVSSHSCHHFAKLSSFRN 
NTYKLNWGISSIFPLLPPICKILFVQ 
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TTTGCTCAGAACAAATCCCATCTAGTCATCATGAGCCTACTGCTCACTCTCAACATTCTACTCCTCCAAAAAACAAGAGA 

, 1 I . I . ■ ^ ! > I ' ! 1 \ ^ ' ' H ^00 

AAACCACTCTTCTTTACCG1ACA7CACTACTACTCCGATCACGACTGAGAGTTGTAAGATGAGGAGGTTTTT7CTTCTCT 



LL^RNA I. . , - GYC. LSTFYSSKKEE 
FCSEEMPSSDDEATADSQHSTPPKKKR 
FAQKKCHLVMflRLLLTLNILLLQKRRE 

— -H 1 ■ ■ ■ I 1 1— ' 1 * 1 ' I ' ' ■ ■ i ^ 1- 

KSLkFAn. HHHP. QQSEVN. EELFSSF 
QESSIGDLSSSAVASE. CEVCGFFFL 
KA. FFHWR7 I ILSSSVRLMRSRWFLLS 



:Sty I 

AAGGTAGAACACCCCAAGGACTTTCCTTCAGAATTGCTAAGTTTTTTGAGTCATGCTGTG7T7AGTAATAGAACTCTTGC 
, ^ 1 1 , i 1 i 1 , 1 i i f. 480 

77CCATC77C7CGGC77CC7GAAAGGAAG7C7TAACGA77CAAAAAAC7CAG7ACGACACAAA7CAT7A7CT7GACAACG 



KGRRPQGLSFRIAKFFESCCV. . . KSC 
KVEDPKDFPSELLSFLSHAVFSNR7LA 
R. K7PRTFLQNC. VF. VMLCLVIELL 

, 1 i 1 h— — i \ 1 ' ' ' I 1 1 1 1— H 1 h 

PLLGWPSEKLIALNKSDHQ7. YYFEQ 
F7SSGLSKGESNSLKKL. A7NLLLVRA 
LYFVGLVKR. FQ, 7KQ7«SHIC7ISSKS 



77GC777GC7A777ACACCACAAAGGAAAAAGC7CCAC7CC7A7ACAAGAAAA77A7GGAAAAA7A77C7G7AACC777A 

_H i 1 ■ 1 1 ( ^— i 1 1 I ' ■ 1 1- 560 

AACGAAACGA7AAA7G7GG76T77CCTT777CCACGTCACGA7A7G77CT77TAA7ACC7TT77A7AAGACA77GGAAA7 



LLCYLHHKGKSC7AIQENYGKIFCNLY 
CFAI YT7KEKAALLYKKIMEKYSVTF 
LALLF7PQRKKLHCY7RKLWKNIL. PL 

_H 1 , 1 1 [ 1 1 1 i 1 1 1 1 \ 1- 

KSQ. KCWLPFLQVAICSF. PFINQLR. 
QKAI. VVFSFAASSYLFIISFYE7VKI 
AKSNVGCLFFSCQ. VLFNHFF'IRYGK 



^Asel 

TAAG7AGGCA7AACAG77A7AA7CA7AACA7ACTGT77777C77AC7CCACACAGGCATACAG7G7C7GC7A77AATAAC 

( ... I ( 1 ' I [ 1 1 1 1 1 1 1 1 1 ' — t- 640 

A77CA7CCG7A77G7CAA7A77AG7A77CTA7GACAAAAAAGAA7GAGG7G7G7CCG7A7C7CACAGAC6A7AA77A77C 



K. A. OL. S. H7VFSYS7QA. SVCY. . 
ISRHNSYNHNILFFLTP'HRHRVSAINN 
VGITVI I ITYCFFLLH7CIECLLLI7 

■ ■ ■ t ! 1 ■ ■ ' ' I ■ I 1 1 1 t 1 ' ' I . I 1 1 1 (- 

LYAYCNYDYCV7KE. EVCAYL7Q. . YS 
LLCLL. L. LMSNKRVCCLCL7DAILL 
77PMV71 IMVYQKKICSVVPnSHRSNIV 



Rsn I 

i 

7A7GC7CAAAAA77G7G7ACCT77ACC7777TAA77TG7AAAGGGG77AA7AAGGAA7A777GA7G7A7AG7GCC77GAC 
1 ) ^ 1 1 i \ 1 1 [ -K- i i i i }. 720 

ATACCAG77777AACACA7GGAAA7CGAAAAA77AAACA7T7CCCCAA7TA77CC77A7AAAC7ACA7A7CACGGAAC7G 



LCSKIVYL. LFNL. RG. . GIFDV. CLD 
YAQICLCTFSFLICKGVNKEYLMYSAL7 
MLKNCVPLAF. FVKGLIRNl. CIVP. 

\ i ^ H 1 \ • f 1 i ■ — ' i 1 ! 1 h 

HEFI7YR. SKLKYLP. YP1NS7YHRS 
A. FNHVKLKKIQLP7LLSYKIYLAKV 
rSLFQYCKAK. N7FPNILFIQHI7GQS 
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BsaB I Pra ' 

i I 
TAGAGATCATAATCAGCCATACCACATTTCTACACGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTCAACCTG 

I I ■ ' ' ' t I I I I I ■ ) I ■ i I j ' t ' ' • i 600 

ATCTCTAGTATTAGTCGGTATGGTGTAAACATCTCCAAAATGAACGAAA7TTTTTGGAGGGTGTGGACGGGGACTTGGAC 

RS SAIPHL.RFYLL.KTSHTSP.T. 
RDHNQPYHICRGFTCFKKPPTPPPEP 
I £ \ I-ISHTTFVEVLLALKNLPHLPLNL 

1 1 t 1 1 ■ ■ ■ ' i ' ^ ' 1 ^ 1 ' 1- 

LDYDAMGCKYLN. KS. FVEWVEGQVO 

ls L. gywmqlpkvcklfggvgggsgs 

SIM] LWVVNTSTKSAKFFRGCRGRFR 

Hinc II 

Mfel Hpa I 

■ I 

AAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTCCAGCTTA7AATCGT7ACAAA7AAAGCAATAGCATCAC 

I I . , I [ 1-*- \ « ) I I I ■ I , , , , I ! 1 ' ■ ' ' I 880 

TTTGTATTTTACTTACGTTAACAACAACAATTGAACAAATAACGTCGAA7ATTACCAATGTTTATTTCGTTA7CGTAGTG 

NIK MOLLLLTCLLQLIMVTNKAIAS 
ET NECNCCC. LVYCSL. WLDIKO. HH 

KHKnNAIVVVNLFIAAYNGYK. SNSIT 

, I I 1 1 ■ I ^ 1 ^ \ ^ i ' 1 ■ t 

FMFHICNNNNVQKNCSr ITVFLAIADC 
VYFSHLQQD. ST. QLKYHNCIFCYC. 
FCLIFAITTTLKNIAA. LP. LYLLLMV 

Xba I 

AAATTTCACAAA7AAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCT 

1 ) I 1 1 i » H 1 1 1 1 1 1 ' — ' K 960 

TTTAAAGTGTTTATTTCGTAAAAAAAGTGACGTAACATCAACACCAAACAGGTTTGAGTAGTTACATAGAATAGTACAGA 

QISQIKHFFHCILVVVCPNSSMYLIMS 
KFHK. SIFFTAF. LWFVQTHDCILSCL 
NFTNKAFFS LHSSCGLSKL INVSYHV 

1 ^ , ! , 1 1 \ i 1 1 ( i ^ 

lEClFCKK. OMRTTT Q'" GFEDlYRiriD 
LN. LYLMKKVAN. NHNTWV. . HJKDHR 
FKVFLANKESCELQPKDLSMLTO,. T. 

Sph I 

jBglll I ^»sil 

ACATCTTGTCCAATGTCTGTCAGTTAGCGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATCCAAAGCATGCAT 

, j , 1 I i 1 1 i 1 \ 1 \ 1 H lOflO 

TCTAGAACACCTTACACACAGTCAATCCCACACCTTTCAGGGGTCCGAGGGCTCGTCCGTCnCATACGTTTCGTACGTA 

RSCGMCVS, GVESPQAPQOAEVCKACI 
DLVECVSVRVWKVPRLPSROKYAKHA 
ILWNVCQLCCGKSPGSPAGRSnOSMH 

1 1 1 . I 1 ! > 4 ' ' I 1 1 ' t ' 1- 

LDQPIHTL. PTSLGWAGWCASTHLAHM 
SRTSHTDTLTHF7CLSGLLCFYAFCAD 
IKHFTH. NPHPFDGPEGAPLLICLMC 
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Sph I 
: ^isil 

CTCAAT7AGTCAGCAACCAGCTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTA 



GAGTTAATCAGTCGTTGGTCCACACCTTTCAGGCGTCCCAGGGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGmAT 



-+ 1120 



SISOaPGVESPQAPODAEVCKACISI 
SQLVSNCVWKVPRLPSRQKTAKHASQL 
LN. SA7RCCKSPGSPAGRSflQSMHLN. 
, 1 , 1 1 1 1 1 , i . f , 1 , 

EIL.-CGP7SLGWAGWCASTHLAHMEIL 
NTLLWTHFTGLSGLLCFYAFCAD. N 
RL. DAVLHPFDGPECAPLLICLHCRL. 



■4- 



Nco I 
,Sty I 



GTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGITCCGCCCATTCTCCCCCCCATG 



CAGTCGTTGGTATCACGGCGGGGATTGACGCGGGTAGGGCGGGGATTGAGGCGGGTCAAGGCGGGTAAGAGGCGGCGTAC 



-t* 1200 



SDOP. SRP. LRPSRP. LRPVPPILRPM 
VSNHSPAPNSAHPAPNSAQFRPFSAPW 
SATI VPPLTPP IPPLTPPSSAHSPPH 



CGYDRG. SRGDRG. SRGTGGMRRGM 
TLLWLGAGLEAWGAGLEAWNRGNEAGH 
DAVMTGGRVGCMCGRVGGLEAWEGGWP 



Hae III Haelir 

! : 



Bgll 



Hae III 



GCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTATTCCAGAAGTAGTGAGGAGGCTTT 



-f- 



CCACTGATTAAAAAAAATAAATACGTCTCCGGCTCCGGCGGACCCGGAGACTCGATAAGGTCTTCATCACTCCTCCGAAA 



-+ 1280 



AD, FFLFMGRPRPPRPLSYSRSSEEAF 
LTNFFYLCRGRGRLGL. AIPEVVRRL 
G. LIFFIYAEAEAASASELFQK. . GGF 



-f- 



-+- 



H i (. 



AS.NKKNICLGLGGRGRL. ELLLSSAK 
SVLKK. KHLPRPRRPR "0 AIGSTTLLSK 
OSIKKI. ASASAAEAESSNWFY, HPPK 



Hae 111 
StuI 
! Avril 
.Sty I 

i 



Ava I 



TTTGGAGGCCTAGGCTTTTGCAAAAAGCTCCCTCGAGAGCTTGGCGTAATCATGGTCA7AGCTGTTTCCTGTGT6AAATT 
— — 1 ^ 1 1 i ^ 1 \ 1- ^ H 1 h 



AAACCTCCGGATCCGAAAACGTTTTTCGAGGGAGCTCTCGAACCGCATTAGTACCAGTATCGACAAAGGACACACTTTAA 



f 1350 



LEA. AFAKSSLESLA. SWS. LFPV. N 
FVRPRLLOKAPSRAWRNHGHSCFLCE I 
fGCLGFCKKLPRELGVIMV lAVSCVKL 



H — — . — H 



KSA. AKAFLERSLKAYDHDYSNGTHFQ 
QL 6LSKCFAGELAQRL, P. LQKRHSI 
KPPRPKQLFSGRSSPT IMTMATEQTFN 
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GTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTCTAAAGCCTGCGGTGCCTAATCAGTGAGCTAA 

— — i ] 1 i 1 1 . i ^ ] . 1 — — , 1- 1 ■ ( ^u^o 

CAATAGGCGAGTGTTAACGTGTGTTGTATGCTCGGCCTTCGIATTTCACATTTCGCACCCCACGGATTACTCACTCCATT 

CYPLTIPHNIRAGSIKCKAWGA. . VS. 
VIRSOFHTTYEPEA. SVKPCVPNE. AN 
LSAHNSTQHTSRKHKV. SLGCLMSEL 

1 1 — H H I i \ 1 1 1 h— 1 1 ^ (. 

GSVICCLHRAPLMFHLAQPA. HIL 
TIRECNWVVYSGSAYLTFGP7GLSHAL 
NDA. LEVCCVLRFCLTYLRPHRILSSV 

Asel pvull Asel Hae IH 

! i ■ 

CTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAA7CGGCCA 
^ \ i i ^ 1 ^ i ■ ■ ■ t \ ' ' I ■ ■ * i ^ K 1520 

GAGTGTAATTAACGCAACGCGAGTGACGGGCGAAAGGTCAGCCCTT7GGACAGCACGGTCCACGTAATTACTTAGCCGGT 

LTLIALRSLPAFQSGNLSCOLH. . IGQ 
SH. LRCAHCPLSSRETCRASCINESA 
THINCVALTARFP YGKPVVPAALMNRP 

^ i ^ ^ H— H . ) , 1 ^ H—H i ^ ^ 

SVNiANRESGAKWDPFRDHWSC. HIPW 
EC. NRQA. QGSELRSVORALQMLSDAL 
MLOTASVARKGTPFCTTGAANIFRG 

.Sap I 

i 

ACGCGCGCGGAGAGGCGGTTTGCGTATTGGGCCCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGG7CGTTCGGC 
^ 1 ^ 1 1 1 ^ i 1 ! 1 1 ' ■ - 1 i \ h 1600 

7GCGCGCCCC7C7CCGCCAAACGCA7AACCCGCGAGAAGGCGAAGGAGCGAG7GAC7CAGCGACGCGAGCCAGCAAGCCG 

RAGRCGLR IGRSSASSL7DSLR5VVR 
NARGEAVCVLGALPLPRSL7RCARSFG 

TRGERRFAYWALFRFLAH. LAALGRSA 
^ i " I 1 f 1 1 ^ ) 1 1 ^ ^ i y 

RAPLPPKR1PREEAEESVSESRE77RS 
ARPSA7C7NPARGSCRESVRQARDNP 
VRPSLRNAYQASKRKRA. QSAASPREA 

BsrB I 

TGCGGCGA6CGG7A7CAGC7 CAC7CAAAGGCCG7AA7ACGG77A7CCACAGAA7CAGGGGA7AACGCAGGAAAGAACA7G 

I ' i — I 1 ' ' ■ ' I ' ' > I h— : ' I I! 1 \ 1 h 1650 

AC6CCGC7CGCCA7AG7CGAG7CAC7T7CCGCCA77A7GCCAA7AGG7C7C77AC7CCCC7A77GCG7CC777C77e7AC 

LRRAVSAHSICAVIRLS7ESCDNAGICNM 
CGERYQL7QRR. YGYPQN0GI7QER7C 
AASCISSLKCCN7VIHRIRG. RRKEH 

^ \ ' 1 • i 1 ! ^ i 1 ' i ^ i 1 

RRATDA. EFA71RNDVSDP5LAPFF(1 
QPSRY. SV. LRYYP. GCF. PIVCSLVH 
AALPILESLPPLV7IWLILPYRLFSC7 
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.Hae lit .Hae III ,Hae 111 

TCAGCAAAAGGCCAGCAAAAGGCCAGGAACCCTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCCCCCCCCTGA 

1 ■ ... I i i 1 \ 1 ! 1 ! i ! t \ h 1760 

ACTCGT7TTCCGGTCGTTTTCCGGTCCTTGGCATTTTTCCGGCGCAACGACCCCAAAAAGGTATCCGAGGCGGGGGGACT 

AK6QQKARNRKKAALLAFFHRLRPPD 
EQKASKRPGTVKRPRCWRFS IGSAPL 
VSKRPAKGGEP. KGRVAGVFP, APFP. 

, 1 , 1 I 1 t 1 ' — I \ 1 1 1 1 * H 

HAFPWCFALFRlFAANSANKWLSRGGS 
SCFALLLGPVTFLCRQORKEMPEAGRV 
LLLGAF PWSGYFPR7APTKGYAGGGQ 

CGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACACGACTA7AAAGATACCAGGCGTTTCCCCCTG 

i 1 1 1 ^ 1 ^ 1 1 . ■ ■ i ■ ■ ■ ■ I i— 1 1 ' ■ ■ t (- 18^*0 

GCTCCTAGTGTTTTTAGCTGCGAGTTCAGTCTCCACCGCTTTGGGCTGTCCTGATATT7CTATGGTCCGCAAAGGGGGAC 

EHHKNRRSSQRWRNPTGL. RYQAFPP 
TSITKIDAQVRGGETRQDYKDTRRFPL 
RASQKSTLKSEVAKPDR7 IK 1 PGVSPW 

^ . t . ■ ■ ■ ) ■ H 1 ^ h-^ ! 1 ! I ^ i 1 ■ ■ ■ I 

SC. LFRREL. LHRFGVPSYLYWANGGP 
LMVFISA. 7LPPSVRCS. LSVLRK6R 
RADCFDVSLDS7AFGSLVIF IGP7EGQ 

GAAGC7CCC7CGTGCGC7C7CC7G77CCGACCC7GCCGC77ACCGGA7ACC7G7CCGCC777C7CCC77CG6GAAGCG7G 

1 1 1 1 ■ ■ ■ ■ I ■ ■ ■ i 1 1 1 1 I 1 t-H 1 , ). 1920 

C77CGAGGGAGCACGCGAGAGGACAAGGC7GGGACGGCGAA7GGCC7A7GGACAG6CGGAAAGAGGGAAGCCC77CGCAC 

GSSLVRSPVP7LPL7GYLSAFLPSGSV 
EAPSCALLFRPCRLPD7CPPFSLREAW 
KLPRALSCSDPAAYRIPVRLSPFGKR 

1 . . ■ ■ I H 1— I 1 I . . t ■ ■ ■ ■ I — ■ 1 i 1 ■ ■ ■ ■ I 

LER7REG7GVRGSVPYRDAKRGEPL7 
SAGEKARRNRGORKGSYQGGKERRSAH 
fSGRASEQESGAA. RIG7RREGKPFRP 

^ApaLI 

GCGC777C7CAA7GC7CACGC7C7AGG7A7C7CAG77CGG7G7AGG7CG77CGC7CCAAGC7GGGC7G7G7GCACGAACC 

1 1 ' ■ ■ ' I 1 1 1—- — 1 1 1 1 1 1 p 1- 2000 

CGCeAAAGAG77ACGAG7GCGACA7CCA7AGAG7CAAGCCACA7CCAGCAAGCGAGG77CGACCCGACACACG7GC77GG 

ALSQCSRCRYLSSV. V'^ RSKLGCVHEP 
RFLNAHAVGISVRCRSFAPSWAVC7N 
GAFSnL7L. VSQFGVGRSLQAGLCAR7 

1 1 ■ ■ ■ ■ t ■ ■ ■ ■ I ■ ■ ■ ■ t 1 — ' ' ■ I ' ' ■ ■ I f 1 ' ' ■ ' I ' ' t ' ' I 1 h 

ASE. HERQLYRLE7Y77RELSPC7CS6 
RKRLA. A7PIE7RHLDNAGLQA7HVFG 
AKEISVSY7D. NP7PRESWAPSHARV 

Nd I 

CCCCG77CAGCCCGACCGC7GCGCC77A7CCCG7AAC7A7CG7C7TGAG7CCAACCCGG7AAGACACGAC77A7CGCCAC 
1 1 , 1 , 1 ^ 1 ( , 1 , i 1 ' ■ ■ t 2080 

GGGGCAAG7CGGGC7GGCGACGCGGAATAGGCCA77GA7AGCAGAAC7CAGG77GGGCCA77C7G7GC7GAA7AGCGG7G 

PVQPDRCALSGNYRLESNPVRHDLSP 
PPFSP7AAPYPV7IVLSP7R. D77YRH 
PRSARPLRLIR. LSS, VQPGK7RLIA7 

— 1 ... I 1 * ' ■ ■ ■ t ■ ■ ■ ■ I i 1 1 1 1 ■ ■ ■ ' t ' ■ ' ' I \ 1- 

G7. CSRQAKDPL. RRSDLCTLCSKDGS 
GNLGVAAG. C7V17KLGVRYSVV. RW 
GREARGSRR IRYSDDQ7WGPLVRS lAV 
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^Hae III 

TGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAAC 

— i i ^ H— 1 — • 1 -f— - — ' I ' — — ! 1 1 ■ ■ ■ ' I ■ ■ — I I h 2160 

ACCGTCGTCGGTGACCATTGTCCTAATCGTCTCGCTCCATACATCCGCCACGATGTCTCAAGAACTTCACCACCGGATTG 

LAAATGNRISRARYVGGATEFLKWWPN 
WQQPLVTGLAERGM. AVLGSS. SGGLT 
GSSHW. QD. QSEVCRRCYRVLEVVA. 

H f ' • • \ ' t i ^ 1 ■ ' ' t 1 ^ ! 1 1 — + 

AAAVPLLILLALYTPPAVSNKFHHGL 
QCCGSTVPNASRPI YATSCLEQLPPRV 
PLLWQYCS. CLSTHLRH.LTRSTTA. S 

TACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTC 

I I 1 1 — 1 i i 1 1 — I ( i \ (- 22*10 

ATGCCGATGTGATCTTCCTGTCATAAACCATAGACGCGAGACGACTTCGGTCAATGGAAGCCTTTTTCTCAACCATCGAG 

YGYTRRTVFGICALLKPVTFGKRVGSS 
TATLEGQYLVSALC. SQLPSEKELVA 
LRLH. KDSIWYLRSAEASYLRKKSW. L 

! < 1 1 1 I I ■ > H ^— H ^— 1 ■ . ■ I 

P - VLLVTNPIQARSFGTVKPFLTPLE 
VAVSSPCYKTDASQQLWNGESFSNTAR 
RSC. FSLIQYRREASAL. RRFFLQYS 

TTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTT7GTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGAT 

' ■ ■ I ' — ' I 1— — I ' — ' ■ i 1 ! -t-*- — 1 I i 1 H 2320 

AACTAGGCCGTTTGTTTGGTGGCGACCATCGCCACCAAAAAAACAAACGTTCGTCG7CTAATGCGCCTCTTTTTTTCCTA 

SGKQTTAGSGGFFVCKQQITRRKKC 
LDPANKPPLVAVVFLFASSRLRAEKKD 
LIRQTNHRW. RWFFCLQAADYAQKKRI 

— ^— 1 1 \ 1 !''■■» 1 ' ■ ■ t 1 1 1 I I i 

ODPLCVVAPLPPKKTQLCCIVRLFFPD 
SGAFLGGSTATTKKNALLLNRASFFS 
KIRCVFWRQYRHNKQKCAAS. ACFFLl 

^spH I 

CTCAAGAA GATCCT7TGATCTTTTCTACGGGGTCTGACGCTCAGTCGAACGAAAACTCACGTTAAGGGATTTTGGTCATG 

— * ' ■ ' ' I ' ' 'I i 1 1 — ' I I I 1 ■ ■ ■ ■ I 2^00 

GAGTTCTTCTAGGAAACTAGAAAAGATGCCCCAGACTGCGAGTCACCTTGCTTTTGAGTGCAATTCCCTAAAACCAGTAC 

SQEDPL I FSTGSDAQWH ENSR. G ] LVtt 
1-KKIL. SFLRCLTLSGTKTHVKGFWS. 
SRRSFDLFYGV, RSVERKLTLRDFGH 

H ^- H , 1 ■ ■ ■ ' t ■ ' ■ ■ I ■ 



SSGKIKEVPDSA, HF5FER. PIKTM 
Ri-FIRQDKRRPRVSLPVFV. TLPNQDH 
ELLDKSRK. PTQRETSRFSVNLSKP. S 

pra I Dra I 

i I 

AGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATCAACTTTTAAATCAATCTAAAGTATATATGAGTA 

— — I i 1 1 1 ■ ■ ■ ■ I 1 — • — H i I 1 ^ — H — t ■ . t 1 h 2'J80 

TCTAATACTTTTTCCTAGAAGTGGATCTAGGAAAATTTAATTTTTACTTCAAAATTTAGTTAGATTTCATATATACTCAT 

RLSKRIFT. ILLN. K. SFKSI, SlYE. 
DYQKGSSPRSF. IKNEVLNQSKVYMS 
EIIKKDLHLDPFKLKMICF. INLKYI. V 

^- ' i ■ i 1 i i ^— ! M i i 1 1 (- 

LNDFLIKV. IRKF. FHLKLDI. LIYSY 
S.. FPDEGLDK. ILFSTKF. DLTYILL 
I»LFSR. RSGKLNFIFN. ILRFYIHT 
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AACT7GGTCTCACAGTTACCAATGCTTAATCAG7GAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCA7CCATAGTTG 

^ \ , 1 1 1 1 1 i i ^ j ^ 1 ■ ' ■ ' h 2560 

TTGAACCAGACTGTCAATGGT7ACGAAT1AGTCACTCCG7GGATACAGTCGCTAGACAGATAAAGCAAGTAGGTATCAAC 

TWSDSYOCLISEAPISAICLFRSS I V 
KLGLTV7NA. SVRHLSORSVYFVHP. L 

NLV. OLPriLNQ. G7YLSDLS1SFIHSC 
, i > i 1 ■ ' I 1 1 ^ \ — f \ 1 \ . h 

VQDSL - whk:ilsagieaiornredm7a 

SPRV7VLA. D7LCRD. RD7. K7. GYN 
Ft(70CNCISL. HPV. RLSRDIENMWLQ 

.Hae til 

CC7GAC7CCCCG7CGTG7AGA7AAC7ACGA7ACGGGAGGCC77ACCA7C7GGCCCCAG7GC7GCAA7GA7ACCGCGAGAC 

1 i — ' — H — ' — i ' ' ' ' ' ! ' ' I 1 ' 1— 1 f- 2640 

GGAC7GACGGGCAGCACA7C7A77GA7GC7A7GCCCTCCCGAA7GG7AGACCG6GG7CACGACG77AC7A7GGCGC7CTG 

A. LPVV. I77IREGLPSGPSAA«IPRD 
PDSPSCR.LRYGRAYHLAPVLC YRE7 
L7PRRVDNYDTGCL7IWPQCCND7AR 
1 H ^ H — • i \ i ! 1 \ 1 1 [. 

QSG7TYIVVIRSPKGDPGLAAI ICRS 
GSEGDHLYSRYPLA. WRAG7SCHYRSV 
RVGRR7SL. SVPPSVMQGWHQLSVALG 

Bgll 

^Hae III Ava II 

CCACGC7CACCGGC7CCAGA777A7CAGCAA7AAACCAGCCAGCC6GAAGGGCCGAGCGCAGAAG7GG7CC7GCAAC777 

' 1 1 ' i 1 ^ ^ i ^ 1 1 ■ ■ ■ I 2720 

GG7GCGAG7GGCCGAGG7C7AAA7AG7CG77A7T7CG7CGC7CGGCC77CCCGGC7CGCG7C77CACCAGGACG77GAAA 

PRSPAPDLSAIN0PACRAERRSGPA7L 
HAHRLQIYQQ. 7SQPEGPSAEVVLQL 
PTLTGSRFISNKPASRtCGRAQKWSCNF 

^ ! 1 1 1 P i ' I ^ 1 ■ ■ ' I 1 h 

GREGAGSKDA IFWGAPLASRLLPGAVK 
WA. RSWI. . CYVLW6SPGLAST7RCS. 
VSVPELNILLLGALRFPRACFHDQLK 

Asel ^ct 1 fsp I 

• i i 

A7CCGCC7CCA7CCAG7C7 A77AAT7G7TGCCGGGAAGC7ACAG7AAG7AG77CGCCAG77AA7AG777GCGCAACG77G 

' ^ ' ' ^ f " H ' ' ' ' ■ ' i — • K 2800 

TAGGC6GAGG7AGG7CAGA7AAT7AACAACGGCCC77CGA7C7CA77CA7CAAGCGG7CAA77A7CAAACCCGr7GCAAC 

SASIQSINCCREARVSSSPVNSLRNV 
rPPPSSLLIVAGKLE. VVRQLIVCA7L 
' R L H P y Y . L P G S S K F A S . F A Q R C 

DAEMWDILQQRSAL7LLEG7LLKRLT7 

CGGDLRNITAPFSSY7TRWNi7aAVN 
'RRWGT. . NNGPL - LLYNAL. YNACRQ 

"rGCCAT7GC7ACAGGCA7CG TGG7GTCACGCTCGTCGT7TGG7A7GGCT7CAT7CAGC7CCGGT7CCCAACGA7CAAGG 
" ' ' I - ■ I ■ ■ ■ t -.4 ■■ - — ■ ♦ ■ . ■ j I I I ■ ■ ■ j f ■■■■ j ■■■■),. I , 2680 

»ACGG7AACGA7G7CCG7AGCACCACAG7GCGAGCAGCAAACCA7ACCGAAG7AAGTCGAGGCCAAGGC77GC7A6TTCC 

'AIA7GIVVSRSSFGMASFSSGSQRSR 
1-PLLQASWCHARRLVWLHSAPVPNDQG 

CHCYRHRGV7LVVWYGF1QLRFPTIK 
" f ^ i ' ^— ^ ' t f— . \ _H ■ ■ t 

AnAVPM77DREDNP lAENLEPEWRDL 
GNSCAOHH. ARRK7HS. EAG7GLS. P 
QWO. LCRP7VS77Q p J^UJ^- ^ N C V I L A 
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Ava II Pvu I Hae III 

CGAGT7ACATGATCCCCCATGTTGTGCAAAAAACCGGTTAGCTCCTTCGGTCCTCCGA7CGTTGTCAGAAGTAAGTTGCC 

— 1 ' ' ' ' ' ■ * ' ^ 1 ' 1 I I 2960 

GCTCAATGTACTAGGGGGTACAACACGTTTTTTCGCCAATCGAGGAAGCCAGGAGGCTAGCAACAGTCTTCATTCAACCG 

RVT. SPMLCKKAVSSFGPPIVVRSKLA 
ELHDPPCCAKKRLAPSVLRSLSEVSW 
ASYMIPHVVOKSG. LLRSSDRCQK. VG 
^ ! ^— H ^ 1 1 ^ ^ ^ 1 , i t 

RTVHDGnNHLFATLEKPGGITTLLLNA 
SNCSGGHDAFFRNAGETRRDNDSTLQG 
L. MIGWTTCFLP. SRRDESRQ. PYTP 

CGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTG7CATGCCATCCG7AAGATGCTTTTCTGTGA 
^ i ' 1 ^ i < i 1 , H ■ — ^ ] 1 h 30^0 

GCG7CACAA7AG7GAG7ACCAA7ACCGTCG7GACG7AT7AAGAGAA7GACAG7ACGG7AGGCA7TC7ACGAAAAGACAC7 

AVLSLMVMAALHNSL7Vr. PSVRCFSV 
PGCYHSWLWQHCIILLLSCHP DAFL 
RSVI7HGYGS7A. FSYCHAIRKMLFCD 

— — — \ ' ' ■ ■ ' ' ' I * ' ' ' i — ^ . ■ r 1 . h 

A7NDSM7IAASCLERV7MGD7LHKE7V 
CH. . EHNHCCOhlRKSDHWGYSAKRH 
RL7IV. P. PLVAYNE. Q. AMRLISKQS 

Rsa I 

,Sca I Nd I Hinc M 

C7GG7GAG7AC7CAACCAAG7CA77C7GAGAA7AG7G7ATGCBGCGACCGAG77GC7CTTGCCCGGCG7CAACACGGGA7 

" " ' " " ' ^ ' ' 1 ^ — ~^ 1 ^ — - — ' 1 ■ I 1— — I- 3120 

GACCAC7CA7GAG77GG77CAG7AAGAC7C77A7CACA7ACGCCGC7GGCTCAACGAGAACGGGCCGCAGT7G7GCCC7A 

'TGEYSYKSF. E. CMRRPSCSCPAS7RD 
t-VS7QPSHSENSVCGDRVALARRQHG 1 
V. VLNQVILR1VYAA7ELLLPGVN7G 

H 1 1 ^ 1 1 ^ 1 ■ ■ I 1 ^ 1- 

PSYEVLDNQSYHIRRGLDEQGADVRS 
S7LV. GL. ESFL7HPSR7ARARR. CPl 
QH7SLW7nRLI7YAAVSKSKGP7LVPY 

Pra I Xmn t 

AA7ACCGCGCCACA7AGCAGAAC777AAAAG7 GC7CA7CA7TGGAAAACG77CT7CGGGGCGAAAAC7C7CAACGA7C77 

" ' ' j .-,1 ■ I ,,. t .,.,(,, , I t , I . . I . . ( , , I , , , ) .. t I I I 3200 

77A7GGCGCGG7G7A7CG7C77CAAA7T77CACGAG7AG7AACC7777GCAAGAAGCCCCGC7777GAGAG77CC7AGAA 

'^TAPHSR7LKVL1 IGKRSSGRKLSR IL 
IPRHIAEL, KCSSLEKVLRGENSQGS 
• YRA7. QNFKSAHHWK7FF6AK7LKDL 

- " ' ■ ■ ^ ^ ^ f ^ ^ • 1 ^ I 1 — + 

LVAGCLLV KFYShMPFREEPRFSEL IK 
IGRWMASS. FHEDNSF7RRPSFE PD 
I^RAVYCFKLLA. . QFVKKPAFVRLSR 
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ApaLI 

ACCECTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAAC7GATCTTCAGCATCTTTTACTTTCACCAGCGTTT 

^ 1 1 ■ 1 ' h— I ^ ( . 1 1 h 3280 

TGGCGACAACTCTAGGTCAACCTACATTGCCTGACCACGTCGGTTGACTAGAACTCGTACAAAATGAAAGTCG7CGCAAA 

PLLRSSSK. PTRAPN, SSASTTFTSV 
YRC. DPVRCNPLVHPTDLCHLLLSPAF 
TAVEIOFDVTHSCTOLIFSIFYFHQRF 

. — I i 1 ! } 1 1 1 1 ! ^ 1 1 ! 1 H- 

GSNLOLE lYGVRACLQDEADKVKVLTE 
RQQSGTRHLGSTCGVSR. CRKSEGAN 
VATSIWNSTVWEHVWSiKLMK. K . WRK 

CTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAA7AAGGGCGACACGGAAA7GTTGAATACTCATACTC 
1 1 ^ 1 \ 1 1 1 \ \ 1 1 , i , ^ 3360 

gacccactcgtttttgtccttccgttttacggcgtt7tttcccttattcccgc7gtgcctttacaacttatgagtatgag 

sg.ak7grqnaakkcira7rkc. ilil 
lgeqkqegkhpokre. grhgnveysys 
wvsicnricakcrkicgnkcd7emln7h7 

I I 1 1 f I * ' 1 ' — H 1 1 \ h 

PHAFVPLCFAAFFPILAVRFHQISnS 
RPSCFCSPL ICCFLSYPRCPF7SYEYE 
07LLFLFAFHRLFPFLPSVSINFV. VR 

Hinc II Spe I Asei 

: I t 

I : : 

77CC77777CAA7ATTAT7GAAGCA777A7CAGGG77A77G7C7CA7GCGCG7TGACA77GA77A77CAC7AG77ATTAA 

' ' I ' ■ ■ ■ ■ I ■ ! 1 i 1 t I i ■ ■ ■ ■ I h 34^10 

AAGGAAAAAG7TA7AA7AAC77CG7AAA7AG7CCCAA7AACAGAG7ACGCGCAACTG7AAC7AA7AAC7GA7CAA7AA77 

FLFQYY. SIYQGYCLMRVDIDY. LVIN 
SFFNl IEAFJRYIVSCAL7LI ID. LL 
LPFSILLKHLSGLLSHAR. H. LL7SY. 

— _ 1 1 1 1 ! 1 ^ • 1 1 — +— » 1 

KRK. Y. QLM. . P. QR«R7Sf1S. QS7IL 
EKKLI ISANIL7I7EHANVN1 IS. NNI 
GKEIKNFCKDPNND, ARQCQNNVL. . 

Hae III Bgl I 

TAG7AATCAAT7ACGCGG7CA7TAG77CA7AGCCCA7A7A7GGAGT7CCGCG7TACA7AAC77ACCG7AAA7GGCCCGCC 
1 ' ■ ■ ! \ i \ i H i \ 1 ! 1 1 1 1 I 3520 

ATCA7TAG77AA7GCCCCAG7AA7CAAG7ATCGGG7ATA7ACC7CAAGft,CGCAA7G7A77GAA7GCCAT77ACCGGGCGG 

SNQLRGH. FIAHIWSSALHNLR. flAR 
IVINYCVISS. PIYGVPRYI7YGKWPA 
. . SI7CSLVHSPYMEFRV7. L7VNGPP 

■ ■ ' ' ' ■ ■ ■ i ' i 1 1 ' ■ . t ■ . ■ ■ I . . . . I \ 1 1 1 i t 

LL. NRP. . NMAWrHLEANCLKRYIARR 
TiL. P7nLEY6nYP7GR. «V. PLHGA 
tYDlVPDNT. LGYISNR7VYSVTrPGG 
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Aat II 

TGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCC 



I I [ 



ACCGACTGGCGGGTTGCTGGGGGCGGGTAACTGCACTTATTACTGCATACAAGGGTATCATTGCGGTTATCCCTGAAAGG 



-t- 36(X) 



LADRPT7PAH. RQ. . RflFP. . RQ. GLS 
WLTAQRPPP IDVNNDVCSHSNANRDFP 
G. PPNDPRPLTSIhTYVPIVTPIGTF 



ASRCVVGAWOR. YHRlNGtYRWYPSE 
OSVAWRGCGMSTLLSTHEWLLALLSKG 
PQGGLSGRGNVDIIVYTGMTVGIPVKW 

.Aat JI Bgl I .Rsa I Nde I Rsa I 

ATTGACGTCAA7GGGTCGACTATTTACGGTAAACTGCCCACTTGCCAGTACATCAAGTGTA7CATATGCCAAGTACGCCC 



-f- , H 



7AACTGCAGTTACCCACCTGATAAATGCCATTTGACGGGTGAACCGTCATGTAGTTCACATACTATACGGTTCATGCCCG 



-+ 3680 



lOVNGWTIYGKLPTWOYIKCI ICQVRP 
LTSMGGLFTVNCPLGSTSSVSYAKYA 
H. RQWVDYLR. 7AHLAVHQVYHMPSTP 



H . — t ■ ■ ■ ' I 



MSTLPHVI. PLSGVGCYMLHIMHWTRG 
NVDIPPSNVTFQGSPLVDLTDYALYAG 
OR. HTS. KRYVAWKATC. TY. IGLVG 

Aat fl h^ae 111 Bgl I Rsa t 

CCTATTGACGTCAATGACGG7AAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCA 



GGA7AACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCATGTAC7GGAATACCCTGAAAGCATGAACCGT 



3760 



LL7SM7VNCPPCIMPS7. PYG7FLLG 
PY.RQ. R. MARLALCPVHDLMGLSYLA 
PIDVNDGKWPAWHYAQYM7LWDFP7WQ 



H , H 



RNVDIVTFPGGPMIGLVHG. PVKRSPL 
OR. HRYJARRANHG7CSRIPSE. KA 
GIS7LSPLHGAQC. AWYMVKHSKGVQC 

BsaA I Nco I 

Rsa I SnaB I .Sty I Rsa I 

^ i 1 i 

G7ACA7C7AC G7A77AGTCA7CGC7A77ACCA7GG7GA7GCGG77T7G6tAG7ACA7CAA7GGGCG7GGA7AGCGGTT7G 

' ' ' ' ' ^ — — * 1 ■ I ' — ' — ' f ' ' • • i 1 ' 1 38*10 

:A7G7AGA7GCA7AA7CAG7AGCGA7AA7GC7ACCAC7ACGCCAAAACCG7CA7G7AG77ACCCGCACC7A7CGCCAAAC 

5TSTY. SSLLPW.CGFGS7SriGVDSGL 
VHIRISHRYYHGDAVLAVHOVAWIAV 
YlYVLVIAlYMVriRFWOYINGRG, RF 

' t ■ ■ > f ^ 1 ' ■ ■ ■ t ^ i ^ j 1 1 ^ K 

VDVY. DDSNGHHHPKPLVDIP7SLPK 
rCRRlL. . WPSA7KATC. HAHIA7Q 

l^M - TNTttAIVMTIRNQCYMLPRPYRNS 
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Aat II 

( 
I 

ACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGCCACCAAAATCAACGGGAC TTTCCA 
TGAG7CCCCCTAAAGGTTCAGAGGTGGGGTAACTCCAGT7ACCC7CAAACAAAACCGTGGTTTTAGTTGCCCTGAM ^^^^ 



^HGDFQVSTPLTSMCVCFGTKlNCTFn 
^ ^ ^ I S K S P P H . R Q W E F V L A P K S T C I % 
" ^ f ^, L H P D y N G S F 'h 'o \ % \ % % 

V • P S K V t' E V C N V D 1 P 't q' K P V V I L P V 'iC u' 
'p ^ ^ ^ ' ° ' ^ ° - H S N T K A G F D V P S E L 

E R P N G L R W C M S T L P L K N Q C W F R 5 K C 

* .Sac I 

AAATGTCGTAACAACTCCCCCCCATTGACCCAAATGGGCGGTACGCGTGTACGGTGGGAGGTC TATATAACC^ 
TTTACACCATTGTTGAGGCGGCGTAACTGCGTTTACCCGCCATCCGCACATGCCACCCTCCAGATATATTC^^ 

"^^^^PPH.RKWAVGVYGGRSI AFI 
K M S Q L R P I D A N 6 R . A C T V G C L Y ' K Q S S 

^ R .N N 3 A P L T Q n G G R R V R W E V Y I S R A L 

''TTVVGCWQRLHA T P T Y P P L D I y 'a «i ' <i r 
I D Y C S R G M S A F P R Y A H V T P P R 'l c L E 
" R L L E A C N V C I P P L R T R H S T [ L L A R 

Asel 



ctggctaactagagaacccactgcttaactggcttatcgaaat'taata cgactcactatagggagaccc 

GACCCATTGATCTCTTGGGTGACGAATTGACCGAA7AGCTTTAATTATGCTGAG7GATATCCCTC7GGG 

^, ^, • ^„ N P L L Kl W L I E I N 7 7 H Y R E 7 

'-*"-'^THCL7GLSKLlRL7I C R l 
V- ^ ^ ^ P T A . L A Y R N . r 's \ \ D 



H ^ ^ . H 



P 

GDP 



P . SSFGSSLQSISILVV ..LSVW 
n *c ^ ■ . ^ ^ Q V P K D F N J R S V I P L G 

° ^ ^ L S G V A . S A . R F . Y S E S Y P S G 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Behan, Dominic P. 

Chalmers, Derek T. 
Liaw, Chen W. 

(^i) TITLE OF INVENTION: Non- Endogenous , Constitut ively 

Activated Human G Protein-Coupled 
Orphan Receptors 

(iii} NUMBER OF SEQUENCES: 28 0 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Arena Pharmaceuticals, Inc. 

(B) STREET: 6166 Nancy Ridge Drxve 
iC) CITY: San Diego 

(d; STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 92122 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentin Release Hi . 0 , Version #1.30 

(vi ! CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER : US 

(B) FILING DATE: 

(C) CLASSIFICATION: 

( V i 1 i ) ATTORNEY /AGENT INFORMATION : 

(A) NAME: Burgoon, Richard P. 

(B) REGISTRATION NUMBER: 34,78-7 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: < 6 1 9 ) 4 5 3 - 7 1? 00 

(B) TELEFAX; (619)453-7210 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1068 base pairs 

(B) TYPE: nuclexc acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECUI.E TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 



ATGGAAGATT TGGAGGAAAC ATTATTTGAA GAATTTGAAA ACTATTCCTA TGACCTAGAC 6 
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2 

TATTACTCTC TGGAGTCTGA TTTGGAGGAG AAAGTCCAGC TGGGAGTTGT TCACTGGGTC 120 

TCCCTGGTGT TATATTGTTT GGCTTTTGTT CTGGGAATTC CAGGAAATGC CATCGTCATT 180 

TGGTTCACGG GGCTCAAGTG GAAGAAGACA GTCACCACTC TGTGGTTCCT CAATCTAGCC 24 0 

ATTGCGGATT TCATTTTTCT TCTCTTTCTG CCCCTGTACA TCTCCTATGT GGCCATGAAT 3 00 

TTCCACTGGC CCTTTGGCAT CTGGCTGTGC AAAGCCAATT CCTTCACTGC CCAGTTG;^C 36 0 

ATGTTTGCCA GTGTTTTTTT CCTGACAGTG ATCAGCCTGG ACCACTATAT CCACTTGATC 42 0 

CATCCTGTCT TATCTCATCG GCATCGAACC CTCAAGAACT CTCTGATTGT CATTATATTC 480 

ATCTGGCTTT TGGCTTCTCT AATTGGCGGI- CCTGCCCTGT ACTTCCGGGA CACTGTGGAG 54 0 

TTCAATAATC ATACTCTTTG CTATA/iCAAT TTTCAGAAGC ATGATCCTGA CCTCACTTTG 600 

ATCAGGCACC ATGTTCTGAC TTGGGTGAA7v TTTATCATTG GCTATCTCTT CCCTTTGCTA 6 6 0 

ACAATGAGTA TTTGCTACTT GTGTCTCATC TTCAAGGTGA AGAAGCGAAC AGTCCTGATC 72 0 

TCCAGTAGGC ATTTCTGGAC AATTCTGGTT GTGGTTGTGG CCTTTGTGGT TTGCTGGACT 78 0 

CCTTATCACC TGTTTAGCAT TTGGGAGCTC ACCATTCACC ACAATAGCTA TTCCCACCAT 84 0 

GTGATGCAGG CTGGAATCCC CCTCTCCACT GGTTTGGCAT TCCTCAATAG TTGCTTGAAC 9 00 

CCCATCCTTT ATGTCCTAAT TAGTAAGAAG TTCCAAGCTC GCTTCCGGTC CTCAGTTGCT 9 60 

GAGATACTCA AGTACACACT GTGGGAAGTC AGCTGTTCTG GCACAGTGAG TGAACAGCTC 102 0 

AGGAACTCAG AAACCAAGAA TCTGTGTCTC CTGGAAACAG CTCAATAA 106 8 

(3) INFORMATION FOR SEQ ID NO : 2 : 

( i ) SEQUENCE CHARACTERISTICS : 

CA) LENGTH: 355 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Glu Asp Leu Glu Glu Thr Leu Phe Glu Glu Phe Glu Asn Tyr Ser 

15 10 IS 

Tyr Asp Leu Acp Tyr Tyr Scr Leu Glu Ser Asp Leu Glu Glu Lys Val 
20 25 30 

Gin Leu Gly Val Val His Trp Val Ser Leu Val Leu Tyr Cys Leu Ala 
35 40 45 
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Phe Val Leu Gly lie 
50 

Leu Lys Trp Lye Lys 

65 

Tie Ala Asp Phe lie 
85 

Val Ala Met Asn Phe 
100 

Asn Ser Phe Thr Ala 

115 

Thr Val Tic Ser Leu 
130 

Ser His Arg His Arg 
145 

lie Trp Leu Leu Ala 
165 

Asp Thr Val Glu Phe 
180 

Lys Kis Asp Pro Asp 
195 

Val Lys Phe lie lie 
210 

Cys Tyr Leu Cys Leu 
225 

Ser Ser Arg His Phe 
245 

Val CyG Trp Thr Pro 
260 

His His Asn Ser Tyr 
275 

Ser Thr Gly Leu Ala 
290 

Val Leu lie Ser Lys 
305 

Glu lie Leu Lys Tyr 
325 



Pro Gly Asn Ala lie Val 
55 

Thr Val Thr Thr Leu Trp 
70 75 

Phe Leu Leu Phe Leu Pro 

90 

His Trp Pro Phe Gly 
105 

Gin Leu Asn Met Phe 
120 

Asp His Tyr lie His 
135 

Thr Leu Lys Asn Ser 
150 

Ser Leu lie Gly Gly 
170 

Asn Asn His Thr Leu 
185 

Leu Thr Leu lie Arg 
200 

Gly Tyr Leu Phe Pro 
215 

lie Phe Lys Val Lys 
230 

Trp Thr He Leu Val 

250 

Tyr His Leu Phe Ser 
265 

Ser His His Val Met 
280 

Phe Leu Asn Ser Cys 
295 

Lys Phe Gin Ala Arg 
310 

Thr Leu Trp Glu Val 
330 



lie Trp Phe Thr Gly 
60 

Phe Leu Asn Leu Ala 
80 

Leu Tyr lie Ser Tyr 
95 

le Trp Leu Cys Lys Ala 
11 0 

Ala Ser Val Phe Phe Leu 

125 

Leu I]e His Pro Val Leu 
140 

Leu He Val He He Phe 
155 160 

Pro Ala Leu Tyr Phe Arg 
175 

Cys Tyr Asn Asn Phe Gin 
190 

His His Val Leu Thr Trp 
205 

Leu Leu Thr Met Ser He 

220 

Lys Arg Thr Val Leu He 
235 240 

Val Val Val Ala Phe Val 
255 

He Trp Glu Leu Thr lie 
270 

Gin Ala Gly He Pro Leu 
285 

Leu Asn Pro He Leu Tyr 
300 

Phe Arg Ser Ser Val Ala 
315 320 

Ser Cys Ser Gly Thr Val 
335 



Ser Glu Gin Leu Arg Asn Ser Glu Thr Lys Asn Leu Cys Leu Leu Glu 
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340 34b 3S0 

Thr Ala Gin 
355 

(4) TKTFORMATTON FOR SEQ ID NO:3: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1069 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

ATGGGCAACC ACACGTGGGA GGGCTGCCAC GTGGACTCGC GCGTGGACCA CCTCTTTCCG 5 0 

CCATCCCTCT ACATCTTTGT CATCGGCGTG GGGCTGCCCA CCAACTGCCT GGCTCTGTGG 12 0 

GCGGCCTACC GCCAGGTGCA ACAGCGCAAC GA<:;CTGGGCG TCTACCTGAT GAACCTCAG : 18 0 

15 ATCGCCGACC TGCTGTACAT CTGCACGCTG CCGCTGTGGG TGGACTACTT CCTGCACCAC 240 

GACAACTGGA TCCACGGCCC CGGGTCCTGC AAGCTCTTTG GGTTCATCTT CTACACCAAT 3 00 

ATCTACATCA GCATCGCCTT CCTGTGCTGC ATGTCGGTGG ACCGCTACCT GGCTGTGGCC 360 

CACCCACTCC GCTTCGCCCG CCTGCGCCGC GTCAAGACCG GCGTGGCCGT GAGCTCCGTG 4 20 

GTCTGGGCCA CGGAGCTGGG CGCCAACTCG GCGCCCCTGT TCCATGACGA GCTCTTCCGA 4 80 

20 GACCGCTACA ACCACACCTT CTGCTTTGAG AAGTTCCCCA TGGAAGGCTG GGTGGCCTGG 54 0 

ATGAACCTCT ATCGGGTGTT CGTGGGCTTC CTCTTCCCGT GGGCGCTCAT GCTGCTGTCG 6 00 

TACCGGGGCA TCCTGCGGGC CGTGCGGGGC AGCGTGTCCA CCGAGCGCCA GGAG7U\GGCC 66 0 

AAGATCAAGC GGCTGGCCCT CAGCCTCATC GCCATCGTGC TGGTCTGCTT TGCGCCCTAT 72 0 

CACGTGCTCT TGCTGTCCCG CAGCGCCATC TACCTGGGCC GCCCCTGGGA CTGCGGCTTC 780 

25 GAGGAGCGCG TCTTTTCTGC ATACCACAGC TCACTGGCTT TCACCAGCCT CAACTGTGTG 84 0 

GCGGACCCCA TCCTCTACTG CCTGGTCAAC GAGGGCGCCC GCAGCGATGT GGCCAAGGCC 900 

CTGCACAACC TGCTCCGCTT TCTGGCCAGC GACAAGCCCC AGGAGATGi^.C CAATGCCTCG 96 0 

CTCACCCTGG AGACCCCACT CACCTCCAAG AGGAACAGCA CAGCCAAAGC CATGACTGGC 102 0 

AGCTGGGCGG CCACTCCGCC TTCCCAGGGG GACCAGGTGC AGCTGAAGAT GCTGCCGCCA 108 0 

3 0 GCACAATGA 108 9 
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(5) INF0RJ4ATI0N FOR SEQ ID NO 1 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 62 ammo acids 

(B) TYPE; amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

Cii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Gly Asn His Thr Trp Glu Gly Cys His Val Asp Ser Arg Val Asp 
1 5 10 15 

His Leu Phe Pro Pro Ser Leu Tyr lie Phe Val lie Gly Val Gly Leu 
2 0 2 5 3 0 

Pro Thr Asn Cys Leu Ala Leu Trp Ala Ala Tyr Arg Gin Val Gin Gin 
35 4 0 4 5 

Arg Asn Glu Leu Gly Val Tyr Leu Met Asn Leu Ser lie Ala Asp Leu 
50 55 60 

Leu Tyr lie Cys Thr Leu Pro Leu Trp Val Asp Tyr Phe Leu His His 
65 70 75 80 

Asp Asn Trp lie His Gly Pro Gly Ser Cys Lys Leu Phe Gly Phe lie 
85 90 95 

Phe Tyr Thr Asn lie Tyr lie Ser lie Ala Phe Leu Cys Cys lie Ser 
100 105 110 

Val Asp Arg Tyr Leu Ala Val Ala His Pro Leu Arg Phe Ala Arg Leu 
115 120 125 

Arg Arg Val Lys Thr Ala Val Ala Val Ser Ser Val Val Trp Ala Thr 
130 135 140 

Glu Leu Gly Ala Asn Ser Ala Pro Leu Phe His Asp Glu Leu Phe Arg 
145 150 155 160 

Asp Arg Tyr Asn His Thr Phe Cys Phe Glu Lys Phe Pro Met Glu Gly 
165 170 175 

Trp Val Ala Trp Met Asn Leu T^^r Arg Val Phe Val Gly Phe Leu Phe 
180 135 190 

Pro Trp Ala Leu Met Leu Leu Ser Tyr Arg Gly lie Leu Arg Ala Val 
195 200 205 

Axg Gly Ser Val Ser Thr Glu Arg Gin Glu Lys Ala Lys lie Lys Arg 
210 215 220 



Leu Ala Leu Ser Leu He Ala He Val Leu Val Cys Phe Ala Pro Tyr 
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225 230 235 240 

His Val Leu Leu Leu Ser Arg Ser Ala lie Tyr Leu Gly Arg Pro Trp 
245 250 255 

Asp Cys Gly Phe Glu Glu Arg Val Phe Ser Ala Tyr His Ser Ser Leu 
260 265 270 

Ala Phe Thr Ser Leu Asn Cys Val Ala Asp Pro lie Leu Tyr Cys Leu 
275 280 285 

Val Asn Glu Gly Ala Arg Ser Asp Val A] a Lys Ala Leu His Ann Leu 
290 295 300 

Leu Arg Phe Leu Ala Ser Asp Lys Pro Gin Glu Met Ala Asn Ala Ser 
305 310 315 320 

Leu Thr Leu Glu Thr Pro Leu Thr Ser Lys Arg Asn Ser Thr Ala Lys 
325 330 335 

Ala Met Thr Gly Ser Trp Ala Ala Thr Pro Pro Ser Gin Gly Asp Gin 
340 345 350 

Val Gin Leu Lys Met Leu Pro Pro Ala Gin 
355 360 

(6) IN^FORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID KG : 5 : 
TATGAATTCA GATGCTCTAA ACGTCCCTGC 

(7) INFORKATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
TCCGGATCCA CCTGCACCTG CGCCTGCACC 

(8) INFORMATION FOR SEQ ID NO: 7: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1002 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

ATGGAGTCCT CAGGCAACCC AGAGAGCACC ACCTTTTTTT ACTATGACCT TCAGAGCCAG 6 0 

CCGTGTGAGA ACCAGGCCTG GGTCTTTGCT ACCCTCGCCA C^^ACTGTCCT GTACTGCCTG 12 0 

GTGTTTCTCC TCAGCCTAGT GGGCAACAGC CTGGTCCT'^T GGGTCCTGGT GAAGTATGAG 18 0 

AGCCTGGAGT CCCTCACCAA CATCTTCATC CTCAACCTI^T GCCTCTCAGA CCTGGTGTTC 24 0 

GCCTGCTTGT TGCCTGTGTG GATCTCCCCA TACCACTGGG GCTGGGTGCT GGGAGACTTC 30 0 

CTCTGCAAAC TCCTCAATAT GATCTTCTCC ATCAGCCT^T ACAGCAGCAT CTTCTTCCTG 36 0 

ACCATCATGA CCATCCACCG CTACCTGTCG GTAGTGAG :C C :XTt:TCCAC CCTGCGCGTC 42 0 

CCCACCCTCC GCTGCCGGGT GCTGGTGACO ATGGCTGTGT GGGTAGCCAG CATCCTGTCC 48 0 

TCCATCCTCG ACACCATCTT CCACAAGGTG CTTTCTTCGG GC'TGTGATTA TTCCGAACTC 54 0 

ACGTGGTACC TCACCTCCGT CTACCAGCAC AACCTCTT JT TCCTGCTGTC CCTGGGGATT 600 

ATCCTGTTCT GCTACGTGGA GATCCTCAGG ACCCTGTTJC GGTCACGCTC CAAGCGGCGC 66 0 

CACCGCACGG TCAAGCTCAT CTTCGCCATC GTGGTGGC':": AGTTCCTCAG CTGGGGTCCC 72 0 

TACAACTTCA CCCTGTTTCT GCAGACGCTG TTTCGGAC :r AGATCATCCG GAGCTGCGAG 78 0 

GCCAAACAGC AGCTAGAATA CGCCCTGCTC ATCTGCCGGA ACCTCGGCTT CTCCCACTGC 84 0 

TGCTTTAACC CGGTGCTCTA TGTCTTCGT*^ GGGGTCAAGT TCCGCACACA CCTGAAACAT 90 0 

GTTCTCCGGC AGTTCTGGTT CTGCCGGCTG CAGGCACCGA GCCCAGCCTC GATCCCCCAC 96 0 

TCCCCTGGTG CCTTCGCCTA TGAGGGCGCC TCCTTCTACT GA 100 2 
(9) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
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Met Glu Ser Scr Gly Asn Pro Glu Ser Thr Thr Phe Phe Tyr Tyr Asp 
IS iO 15 

Leu Gin Ser Gin Pro Cys Glu Asn Gin Ala Trp Val Phe Ala Thr Lou 
20 25 30 

Ala Thr Thr Val Leu Tyr Cys Leu Val Phe Leu Leu Ser Leu Val Gly 
35 40 45 

Asn Ser Leu Val Leu Trp Val hc.xj Val Lys Tyr Glu Ser Leu Glu Ser 
50 55 60 

Leu Thr Asn lie Phe He Leu Asn Leu Cys Leu Ser Asp Leu Val Phe 
65 70 75 80 

Ala Cys Leu Leu Pro Val Trp He Ser Pro Tyr His Trp Gly Trp Val 
85 90 95 

Leu Gly Asp Phe Leu Cys Lys Leu Leu Asn Met He Phe Ser He Ser 
100 105 110 

Leu Tyr Ser Ser He Phe Phe Leu Thr He Met Thr He His Arg Tyr 
115 120 125 

Leu Ser Val Val Ser Pro Leu Ser Thr Leu Arg Val Pro Thr Leu Arg 
130 135 140 

Cys Arg Val Leu Val Thr Ket Aia Val Trp Val Ala Ser He Leu Ser 
145 150 155 160 

Ser He Leu Anp Thr He Phe His Lys Val Leu Ser Ser Gly Cys Asp 
165 170 175 

Tyr Ser Glu Leu Thr Trp Tyr Leu Thr Ser Val Tyr Gin Kis Asn Leu 
180 185 190 

Phe Phe Leu Leu Ser Leu Gly He He Leu Phe Cys Tyr Val Glu He 
195 200 205 

Leu Arg Thr Leu Phe Arg Ser Arg Ser Lys Arg Arg His Arg Thr Val 
210 215 220 

Lys Leu He Phe Ala He Val Val Ala Tyr Phe Leu Ser Trp Gly Pro 
225 230 235 240 

Tyr Asn Phe Thr Leu Phe Leu Gin Thr Leu Phe Arg Thr Gin He He 
245 250 255 

Arg Ser Cys Glu Ala Lys Gin Gin Leu Glu Tyr Ala Leu Leu He Cys 
260 265 270 



Arg Asn Leu Ala Phe Ser His Cys Cys Phe Asn Pro Val Leu Tyr Val 
275 280 285 
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Phe Val Gly Val 
290 



Lys Phe Arg Thr His Leu Lys His Val Leu Arg Gin 
295 300 



Phe Trp Phe Cys 
30S 



Arg Leu Gin Ala Pro Ser Pro Ala Sor lie Pro His 
310 315 320 



Ser Pro Gly Ala 



Phe Ala Tyr Glu Gly Ala Ser Phe Tyr 
325 330 



(10) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHAJ^ACTERISTICS : 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
GCAAGCTTGG GGGACGCCAG GTCGCCGGCT 3 0 

(11) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECLT.E TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GCGGATCCGG ACGCTGGGGG AGTCAGGCTG C 31 

(12) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CH7VRACTERISTICS : 

(A) LENGTH: 987 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 11 : 

ATGGACAACG CCTCGTTCTC GGAGCCCTGG CCCGCCAACG CATCGGGCCC GGACCCGGCG 6 0 

CTGAGCTGCT CCAACGCGTC GACTCTGGCG CCGCTGCCGG CGCCGCTGGC GGTGGCTGTA 12 0 

CCAGTTGTCT ACGCGGTGAT CTGCGCCGTG GGTCTGGCGG GCAACTCCGC CGTGCTGTAC IBO 
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GTGTTGCTGC GGGCGCCCCG CATGAAGACC GTCACCAACC TGTTCATCCT CAACCTGGCC 24 0 

ATCGCCGACG AGCTCTTCAC GCTGGTGCTG CGCATCAACA TCGCCGACTT CCTGCTGCGG 3 CO 

CAGTGGCCCT TCGGGGAGCT CATGTGCAAG CTCATCGTG:^ CTATCGACCA GTACAACACC 360 

TTCTCCAGCC TCTACTTCCT CACCGTCATG AGCGCCGAC': GCTACCTGGT GGTGTTGGCC 42 0 

ACTGCGGAGT CGCGCCGGGT GGCCGGCCGC ACCTACAGC^ CCGCGCGCGC GGTGAGCCTG 480 

GCCGTGTGGG GGATCGTCAC ACTCGTCGTG CTGCCCTTCG CAGTCTTCGC GCGGCTAGAC 54 0 

GACGAGCAGG GCCGGCGCCA GTGCGTGCTA GTCTTTCCGC AGCCCGAGGC CTTCTGGTGG 6 00 

CGCGCGAGCC GCCTCTACAC GCTCGTGCTG GGCTTCGCCA TCCCCGTGTC CACCATGTGT 6 60 

GTCCTCTATA CCACCCTGCT GTGCCGGCTG CATGCGATGG GGCTGGACAG CCACGCCAAG 720 

GCCCTGGAGC GCGCCAAGAA GCGGGTGACC TTCCTGGT3G 7GGCAATCCT GGCGGTGTGC 760 

CTCCTCTGCT GGACGCCCTA CCACCTGAGC ACCGTGGTGG CGCTCAGGAC CGACCTCCCG 84 0 

CAGACGCCGG TGGTCATCGG TATCTCCTAC TTCATCACCA GCCTGACGTA CGCCAACAGC 900 

TGCCTCAACC CCTTCCTGTA CGCCTTCGTG GACCCCAGCT TGCGCAGGAA CGTCGGCCAG 96 0 

CTGATAACTT GCGGCGCGGG AGCCTGA 98 7 
(13) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTIC?; : 

(A) LEMGTH: 328 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(li) KOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Asp Asn Ala Ser Phe Ser Glu Pro Trp Pro Ala Asn Ala Ser Gly 
15 10 15 

Pro Anp Pro Ala Leu Ser Cys Ser Asn Ala Ser Thr Leu Ala Pro Leu 
20 25 30 

Pro Ala Pro Leu Ala Val Ala Val Pro Val Val Tyr Ala Val He Cys 
35 40 45 

Ala Val Gly Leu Ala Gly Asn Ser Ala Val Leu Tyr Val Leu Leu Arg 
50 55 60 

Ala Pro Arg Met Lys Thr Val Thr Asn Leu Phe He Leu Asn Leu Ala 
65 70 75 80 
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Tie Ala Asp Glu Leu Phe Thr Leu Val Leu Pro lie Asn He Ala Asp 
85 90 95 

Phe Leu Leu Arg Gin Trp Pro Phe Gly Giu Leu Met Cys Lys Leu lie 
100 105 110 

Val Ala TlG Asp G] n Tyi Asn Thr Phe Scr Ser Leu Tyr Phe Leu Thr 
115 120 125 

Val Met Ser Ala Asp Arg lyr Leu Val Val Leu Ala Thr Ala Glu Ser 
130 135 140 

Arg Arg Val Ala Gly Arg Thr Tyr Ser Ala Ala Arg Ala Val Ser Leu 
145 150 155 160 

Ala Val Trp Gly lie Val Thr Leu Val Val Leu Pro Phe Ala Val Phe 
16 5 170 175 

Ala Arg Leu Asp Asp Glu Gin Gly Arg Arg Gin Cys Val Leu Val Phe 
180 185 190 

Pro Gin Pro Glu Ala Phe Trp Trp Arg Ala Ser Arg Leu Tyr Thr Leu 
195 200 205 

Val Leu Gly Phe Ala He Pro Val Ser Thr He Cyn Val Leu Tyr Thr 
210 215 220 

Thr Leu Leu Cys Arg Leu His Ala Met Arc Leu Asp Ser His Ala Lys 
22 5 23 0 2 3 5 2 40 

Ala Leu Glu Arg Ala Lys Lys Arg Val Thr Phe Leu Val Val Ala He 
245 250 255 

Leu Ala Val Cys Leu Leu Cys Trp Thr Pro Tyr His Leu Ser Thr Val 
260 265 270 

Val Ala Leu Thr Thr Asp Leu Pro Gin Thr Pro Leu Val He Ala He 
275 280 285 

Ser Tyr Phe He Thr Ser Leu Thr Tyr Ala Asa Ser Cys Leu Asn Pro 
290 295 300 

Phe Leu Tyr Ala Phe Leu Asp Ala Ser Phe Arg Arg Asn Leu Arg Gin 
305 310 315 320 

Leu He Thr Cys Arg Ala Ala Ala 

325 

(14) INFORMATION FOR SEQ ID N0:13: 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 
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(i^) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 3 : 
CGGAATTCGT CAACGGTCCC AGCTACAATG 3 0 

(15) INFORKATION FOR SEQ ID N0:14: 

(i) SEQUENCE CHARACTERISTICS: 
CA) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNEES : single 

(D) TOPOLOGY: linear 

(ii) MOLECU7.E TYPE DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

ATGGATCCCA GGCCCTTCAG CACCGCAATA T 31 

(16) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CtlARACTERISTICS : 

(A) LENGTH: 1002 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(XI } SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ATGCAGGCCG CTGGGCACCC AGAGCCCCTT GACAGCAGGG GCTCCTTCTC CCTCCCCACG 6 0 

ATGGGTGCCA ACGTCTCTCA GGACAATGGC ACTGGCCACA ATGCCACCTT CTCCGAGCCA 12 0 

CTGCCGTTCC TCTATGTGCT CCTGCCCGCC GTGTACTCCG GGATCTGTGC TGTGGGGCTG 180 

ACTGGCAACA. CGGCCGTCAT CCTTGTAATC CTAAGGGCGC CCTVAGATGAA GACGGTGACC 24 0 

AACGTGTTCA TCCTGAACCT GGCCGTCGCC GACGGGCTCT TCACGCTGGT ACTGCCCGTC 3 00 

AACATCGCGG AGCACCTGCT GCAGTACTGG CCCTTCGGGG AGCTGCTCTC CAAGCTGGTG 360 

CTGGCCGTCG ACCACTACAA CATCTTCTCC AGCATCTACT TCCTAGCCGT GATGAGCGTG 42 0 

GACCGATACC TGGTGGTGCT GGCCACCGTG AGGTCCCGCC ACATGCCCTG GCGCACCTAC 48 0 

CGGGGGGCGA AGGTCGCCAG CCTGTGTGTC TGGCTGGGCG TCACGGTCCT GGTTCTGCCC 54 0 

TTCTTCTCTT TCGCTGGCGT CTACAGCAAC GAGCTGCAGG TCCCAAGCTG TGGGCTGAGC 600 

TTCCCGTGGC CCGAGCGGGT CTGGTTCAAG GCCAGCCGTG TCTACACTTT GGTCCTGGGC 66 0 

TTCGTGCTGC CCGTGTGCAC CATCTGTGTG CTCTACACAG ACCTCCTGCG CAGGCTGCGG 72 0 
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GCCGTGCGGC TCCGCTCTGG AGCCAAGGCT CTAGGCAAGG CCAGGCGGAA GGTGACCGTC 78 0 

CTGGTCCTCG TCGTGCTGGC CGTGTGCCTC CTCTGCTGC^A CGCCCTTCCA CCTGGCCTCT 84 0 

GTCGTGGCCC TGACCACGGA CCTGCCCCAG ACCCCACTGG TCATCAGTAT GTCCTACGTC 30 0 

ATCACCAGCC TCACGTACGC CAACTCGTGC CTGAACCCCT TCCTCTACGC CTTTCTAGAT 96 0 

GACAACTTCC GGAAGAACTT CCGCAGCAT/^ TTGCGGTGCT GA 1002 
(17) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 amino acidn 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(id) MOLECOLE TYPE: protein 

(xi) SEQUENCE DESCRIFTION: SEQ ID NO : 1 6 : 

Met Gin Ala Ala Gly His Pro Glu Pro Leu Asp Ser Arg Gly Ser Phe 
15 10 15 

Ser Leu Pro Thr Met Gly Ala Asn Val Ser Gin Asp Asn Gly Ihr Gly 
20 25 30 

HiG A5n Ala Thr Phe Ser Glu Pro Leu Pro Phe Leu Tyr Val Leu Leu 
3 5 4 0 4 5 

Pro Ala Val Tyr Ser Gly lie Cys Ala Val Gly Leu Thr Gly Asn Thr 
50 55 60 

Ala Val lie Leu Val lie Leu Arg Ala Pro Lys Met Lys Thr Val Thr 
65 70 75 80 

Asn Val Phe lie Leu Asn Leu Ala Val Ala Asp Gly Leu Phe Thr Leu 
B5 90 95 

Val Leu Pro Val Asn lie Ala Glu His Leu Leu Gin Tyr Trp Pro Phe 
100 105 110 

Gly Glu Leu Leu Cys Lys Leu Val Leu Ala Val AGp His Tyr Asn lie 
115 120 125 

Phe Ser Ser lie Tyr Phe Leu Ala Val Met Ser Val Asp Arg Tyr Leu 
130 135 140 

Val Val Leu A) a Thr Val Arg Scr Arg His Met Pro Trp Arg Thr Tyr 
145 150 155 160 

Arg Gly Ala Lys Val Ala Ser Leu Cys Val Trp Leu Gly Val Thr Val 
165 170 175 
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Leu Val Leu Pro Phe Phe Ser Phe Ala Gly Val Tyr Ser Asn Glu Lgu 
180 185 190 

Gin Val Pro Ser Cys Gly Leu Ser Phe Pro Trp Pro Glu Arg Val Trp 
19S 200 205 

Phe Lys Ala Ser Arg Val Tyr Thr Leu Val Leu Gly Phe Val Leu Pro 
210 215 220 

Val Cys Thr lie Cys Val Leu Tyr Thr Asp Leu Leu Arg Arg Leu Arg 
225 230 235 240 

Ala Val Arg Leu Arg Ser Gly Ala Lye Ala Leu Gly Lys Ala Arg Arg 
245 250 255 

Lys Val Thr Val Leu Val Leu Val Val Leu Ala Val Cys Leu Leu Cys 
260 265 270 

Trp Thr Pro Phe His Leu Ala Ser Val Val Ala Leu Thr Thr Asp Leu 
275 280 285 

Pro Gin Thr Pro Leu Val lie Ser Met Ser Tyr Val lie Thr Ser Leu 
290 295 300 

Thr Tyr Ala Asn Ser Cys Leu Asn Pro Phe Leu Tyr Ala Phe l^eu Asp 
305 310 315 320 

Asp Asn Phe Arg Lys Asn Phe Arg Ser Tie Leu Arg Cys 
3 2 5 3 3 0 

(18) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
ACGAATTCAG CCATGGTCCT TGAGGTGAGT GACCACCAAG TGCTAAAT 4 8 

(19) INFORMATION FOR SEQ ID N0:18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 8 : 
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GAGGATCCTG GAATGCGGGG AAGTCAG 2 7 

(20) INFORMr>.T10N FOK SEQ ID NO: 19: 

(i) SEQUENCE OIARACTERISTICS : 

(A) LENGTH: 1107 bane pairr. 

(B) TYrE: nucleic acid 

(C) STRANL'EDNESS : single 

(D) TOPOLC>GY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 9 : 

ATGGTCCTTG AGGTGAGTGA CCACCAAGTG CTAAATGACG CCGAGGTTGC CGCCCTCCTG 6 0 

GAGAACTTCA GCTCTTCCTA TGACTATGGA GAAAACGAGA GTGACTCGTG CTGTACCTCC 12 0 

CCGCCCTGCC CACAGGACTT CAG^CTGAAC TTCGACCGGC- CCTTCCTGCC A3CCCTCTAC IRO 

AGCCTCCTCT TTCTGCTGGG GCTGCTGGGC AACGGCGCGG TGGCAGCCGT GCTGCTGAGC 24 0 

CGGCGGACAG CCCTGAGCAG CACCGACACC TTCCTGCTGG ACCTAGCT3T AGCAGACACG 30 0 

CTGCTGGTGC TGACACTGCC GCTCTGGGCA GTGGACGCTG CCGTCCA<^TG GGTGTTTGGC 36 0 

TCTGGCCTCT GCAAAGTGGC AGGTGCCCTC TTCAACAT::A ACTTCTACIC A:;GAGCCCTC 't::0 

CTGCTGGCCT GCATCAGCTT TGACCGCTAC CTGAACATA^:; TTCATGCCAC CCAGCTCTAC 48 0 

CGCCGGGGGC CCCCGGCCCG CGTGACCCTC ACCTGCCTGG CTGTCTGGGG GCTCTGCCTG =?4() 

CTTTTCGCCC TCCCAGACTT CATCTTCCTG TCGGCCCACC ACGACGAG'^G CCTCAACGCC biJO 

ACCCACTGrC AATACAACTT CCCACAGGTG GGCCGCACGG CTCTGCGGGT GCTGCAGCTG boO 

GTGGGTGG::T TTCTGCTGCC CCTGCTGGTC ATGGCCTAGT GCTATGCCCA CATCCTGGCC 120 

GTGCTGCTGG TTTCCAG'IGG CCAGCGGCGC CTGCGGGCCA TGCGGCTGGT GGTGGTGGTC 70 0 

GTGGTGGCCT TTGCCCTCTG CTGGACCCCC TATCACCTGG TGGTGCTGGT GGACATCCTC B4 0 

ATGGACCTGG GCGCTTTGGC CCGCAACTGT GGCCGAGAAA GCAGGGTAGA CGTGGCCAAG 90 0 

TCGGTCACCT CAGGCCTGGG CTACATGCAC TGCTGCCTCA ACCCGCTGCT CTATGCCTT':- 9b 0 

GTAGGGGTCA AGTTCCGGGA GCGGATGTGG ATGCTGCTCT TGCGCCTGGG CTGCCCCAAC 102 0 

CAGAGAGGGC TCCAGAGGCA GCCATCGTCT TCCCGCCGGG ATTCATCCTG GTCTGAGACC 108 0 

TCAGAGGCCT CCTACTCGGG CTTGTGA 110 7 

(21) INFORMATION FOR SEQ ID NO: 20: 
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(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 368 amino acids 

(B) TYPE: smmo acid 

(C) STRAIJDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: GEQ ID NO; 20: 

Met: Val Leu Glu Val Ser Asp His Gin Va] Leu Asn Asp Ala Glu Val 
15 10 15 

Ala Ala Leu Leu Glu Ann Phe Ser Ser Ser Tyr Anp Tyr Gly Glu Asn 
20 2S 30 

Glu Ser Asp Ser Cys Cys Thr Ser Pro Pro Cys Pro Gin Asp Phe Ser 
35 40 45 

Leu Asn Phe Asp Arg Ala Phe Leu Pro Ala Leu Tyr Ser Leu Leu Phe 
50 55 60 

Leu Leu Gly Leu Leu Gly Asn Gly Ala Val Ala Ala Val Leu Leu Ser 

65 70 75 eo 

Arg Arg Thr Ala Leu Ser Ser Thr Asp Thr Phe Leu Leu His Leu Ala 
85 90 95 

Val Ala Asp Thr Leu Leu Val Leu Thr Leu Pro Leu Trp Ala Val Asp 
100 105 110 

Ala Ala Val Gin Trp Val Phe Gly Ser Gly Leu Cys Lys Val Ala Gly 
115 120 125 

Ala Leu Phe Asn lie Asn Phe Tyr Ala Gly Ala Leu Leu Leu Ala Cys 
130 135 140 

lie Ser Phe Asp Arg Tyr Leu Asn lie Val His Ala Thr Gin Leu Tyr 
145 150 155 160 

Arg Arg Gly Pro Pro Ala Arg Val Thr Leu Thr Cys Leu Ala Val Trp 
165 170 175 

Gly Leu Cys Leu Leu Phe Ala Leu Pro Asp Phe lie Phe Leu Ser Ala 
180 IBS 190 

His Kis Asp Glu Arg Leu Asn Ala Thr His Cys Gin Tyr Asn Phe Pro 
195 2C0 205 

G]n Val Gly Arg Thr Ala Leu Arg Val Leu Gin Leu Val Ala Gly Phe 

210 215 2 20 

Leu Leu Pro Leu Leu Val Met Ala Tyr Cys Tyr Ala His lie Leu Ala 
225 230 235 240 
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Val Leu Leu Val Ser Arg Gly Gin Arg Arg Leu Arg Ala Met Arg Leu 
245 250 255 

Val Val Val Val Val Val Ala Phe Ala Leu Cys Trp Thr Pro Tyr His 
260 265 270 

Leu Val Val Leu Val Asp lie Leu Met Asp Leu Gly Ala Leu Ala Arg 
275 280 285 

Asn Cys Gly Arg Glu Ser Arg Val Asp Val Ala Lys Ser Val Thr Ser 
290 295 300 

Gly Leu Gly T^'r Met His Cys Cys Leu Asn Pro Leu Leu Tyr Ala Phe 
305 310 315 320 

Val Gly Val Lys Phe Arg Glu Arg Met Trp Met Leu Leu Leu Arg Leu 
325 330 335 

Gly Cys Pro Asn Gin Arg Gly Leu Gin Arg Gin Pro Ser Ser Ser Arq 
340 34b 3 50 

Arg Asp Ser Ser Trp Ser Glu Thr Ser Glu Ala Ser Tyr Ser Gly Leu 
355 360 365 

(22) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
TTAJVGCTTGA CCTAATGCCA TCTTGTGTCC 3 0 

(23) INFORMATION FOR SEQ ID NO: 22: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(li) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
TTGGATCCAA AAGAACCATG CACCTCAGAG 3 0 

(24) INFORMATION FOR SEQ ID NO: 23: 
(i) SEQUENCE CHARACTERISTICS: 
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(A; LENGTH: 107 4 base pairs 

(b; TYPE; nucleic acid 

(C) STRANDEDNESS ; iJ ingle 

(D) TOPOLOGY: linear 

(ii) MOLEOILIi TYPE: DNA (genomic) 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO : 2 3 : 

ATGGCTGATG A^TATCIGCTC TGAATCCACA TCTTCCATGG AAGACTACGT TAACTTCAAC 6 0 

TTCACTGACT T:TACTGTGA GAAAAACAAT GTCAGGC/iGT TTGCGAGCCA TTTCCTCCCA 12 0 

CCCTTGTACT GGCTCGTGTT CATCGTGGGT GCCTTi^GGA ACAGTCTTGT TATCCTTGTC 180 

TACTGGTACT GCACAAGAGT GAAGACCATG ACCGACATGT TCCTTTTGAA TTTGGCAATT 2-10 

GCTGACCTCC TCTTTCTTGT CACTCTTCCC TT CTGGGCCA TTGCTGCTGC TGACCAGTGG 3 00 

TIAGTTCCAGA C-:tTCATGTG CAAGGTGGTC AACAGGATGT ACAAGATGAA CTTCTACAG'": 3hO 

TGTGTGTTGC TGATCATGTG CATCAGCGTG GACAGGTACA TTGCCATTGC CCAGGCCATG 4 20 

AGAGCACATA CTTGGAGGGA GAAAAGGCTT TTGTACAGGA AAATGGTTTG CTTTACCATC 4 30 

TGGGTATTGG CAGCTGCTCT CTGCATCCCA GAjPyATCTTAT ACAGGCAAAT CAAGGAGGAA 54 0 

TCCGGCATTG CTATCTGCAC CATGGTTTAC CCTAGCGATG AGAGCACC;^ ACTGTVAGTCA 6 00 

GCTGTCTTGA CCCTGAAGGT CATTCTGGGG TTCTTCCTTC CCTTCGTGGT CATGGCTTGG 6tJ0 

TGCTATACCA TCATCATTCA CACCCTGATA CAAGCCAAGA AGTCTTCCAA GCACAAAGC: 72 0 

CTAAAAGTGA CCATCACTGT CCTGACCGTC TTTGTCTTGT Cl'CAGTTTCC CTACAACTGC 78 C 

ATTTTGTTGG TGCAGACCAT TGACGCCTAT GCCATGTTCA TCTCCAACTG TGCCGTTTCC 84 0 

ACCAACATTG ACATCTGCTT CCAGGTCACC CAGACCATCG CCTTCTTCCA CAGTTGCCTG 90 0 

AACCCTGTTC TCTATGTTTT TGTGGGTGAG AGATTCCGCC GGGATCTCGT GAAAACCCTG 9hC 

AAGAACTTGG GTTGCATCAG CCAGGCCCAG TGGGTTTCAT TTACAAGGAG AGAGGGAAGC 102 0 

TTGAAGCTGT CGTCTATGTT GCTGGAGACA ACCTCAGGAG CACTCTCCCT CTGA 10 74 
{2b) INFORMATION FOR SEQ ID NO:24: 

( i ) SEQUENCE C}IARACTEKISTICS : 

(A) LENGTH: 3 57 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 
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1X1 ) SEQUENCE DESCRIPTION: SEC ID KO:24: 

Met: Ala Asp Ar.p Tyr Gly Sor Glu Ser Thr Scr Ger Met Glu Asp Tyr 
IS 10 15 

Val Asn Phe Ai;n Phe Thr A'^p Phe Tyr Cy^' '-Y^ Asn Asn Val Arg 

2 0 2 0 30 

Clln Phe Ala Ser His Phe Leu Pro Pro Leu Tyr Trp Leu Val Phe lie 
3 5 4 0 4 5 

Val Gly Ala Leu Gly Asn. Ser Leu Val lie Leu Val Tyr Trp Tyr Cys 
50 55 60 

Thr Arg Val Lys Thr Ket Thr Asp Met Phe Leu Leu Av.n Leu Ala lie 
65 70 7S 80 

Ala Asp Leu Leu Phe Leu Val Thr Leu Pro Phe Trp Ala lie Ala Ala 
85 9C 95 

Ala Asp Gin Trp Lys Phe Gin Thr Phe Met ^Jy^ Lys Val Val Asn Ser 
100 10^ 110 

Met Tyr Lys Met Ar.n Phe Tyr Scr Cys Vn L Leu Leu lie Met Cyii lie 
115 1 1 5 

Ser Val Asp Arg Tyr He Ala lie Ala -Jin Ala Mnr. Arg Ala His Thr 
130 135 140 

Trp Arg Glu Ly.s Arg Leu Leu Tyr Set Ly:3 Met w*al Cys Phe Thr He 
145 150 155 160 

Trp Val Leu Ala Ala Ala Leu Cys lie Pro G^u Tie Leu Tyr Ser Gin 
16S 17.J 175 

He Lys Glu Glu Ser Gly He Ala He Cy^ Thr >!er. Val Tyr Pro Ser 
180 18 5 190 

Asp Glu Ser Thr Lys Leu Lys Ser Ala Val Leu Thr Leu Lys Val He 
195 20O 205 

Leu Gly Phe Phe Leu Pro Phe Val Val Met Ala Cys Cys Tyr Thr He 

210 215 220 

He He His Thr Leu He Gir. Ala Lys Lys Ser Ser Lys His Lys Ala 
225 230 235 240 

Leu Lys Val Thr He Thr Val Leu Ihr Val Phe Val Leu Ser Gin Phe 
245 250 255 

Pro Tyr Asn Cys He Leu Leu Val Gin 'I'hr He Asp Ala Tyr Ala Met 
260 265 270 

Phe He Ser Asn Cys Ala Val Ser Tnr Asn I]e Asp He Cys Phe Gin 
275 280 2 85 
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Val Thr Gin Thr lie Ala Phe Phe His Ser iSyu I^eu Ann Pro Val Leu 
230 295 300 

Tyr Val Phe Val Gly Glu Arg Phe Arg Arg Asp Leu Val Lyr; Thr Leu 
305 310 31S 320 

Lys Asn Leu y Cys lie £:er Gin Ala Gin Trp Val Ser Phe Thr Arg 
325 330 335 

Arg Glu Gly Ser Leu Lys 1,gu Ser Ser Met Leu Leu Glu Thr Thr Ser 
340 34S 350 

Gly A] a Leu Ser Leu 
3 5=> 

(26) INFORI^TICN FOR SEQ ID NO : 2 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LL;nGT}1: 11^0 base pa^rs 

(B) TYPE: aucleii' acid 

(C) STRANDEDNEGS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA. (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 5 : 
ATGGCCTCAT CGACCACTCG GGGCCCCAGG GTTTCTGACT TATTTTCTGG GCTGCCGCCG 6 0 

GCGGTCACAA CTCCCGCCAJ\ CCAGAGCGCA GAGGCCTCGG CGGGCAACGG GTCGGTGGCT 12 0 

GGCGCGGACG CTCCAGCCGT CACG :^CCTTC CAGAGCCTGC AGCTGGTGCA TCAGCTGAAG IBO 

GGGCTGATCG TGCTGCTCTA CAGCGTCGTG GTGGTCGTGG CGCTGGTGGG CAACTGCCTG 2 40 

CTGGTGCTGG TGATCGCGCG GGTGCCGCGG CTGCACAACG TGACGAACTT CCTCATCGGC 3 00 

TU'.CCTGGCCT TGTCCGACGT GCTCATGTGC ACCGCCTGCG TGCCGCTCAC GCTGGCCTAT 3 60 

GCCTTCGAGC CACGCGGCTG GGTGTTC&3C GGCGGCCTGT GCCACCTGGT CTTCTTCCTG 42 0 

CAGCCGGTCA CCGTCTATGT GTCGGTGTTC ACGCTCACCA CCATCGCAGT t^GACCGCTAC 4 80 

GTCGTGCTGG TGCACCCGCT GAGGCGCGCA TCTCGCTGCG CCTCAGCCTA CGCTGTGCTG 54 0 

GCCATCTGGG CGCTGTCCGC GGTGCTGGCG CTGCCGCCCG CCGTGCACAC CTATCACGTG 6 00 

GAGCTCAAGC CGCACGACGT GCGCCTCTGC GAGGAGTTCT GGGGCTCCCA GGAGCGCCAG 66 0 

CGCCAGCTCT ACGCCTGGGG GCTGCTGCTG GTCACCTACC TGCTCCCTCT GCTGGTCATC 72 0 

CTCCTGTCTT ACGTCCGGGT GTCAGTGAAG CTCCGCA/iCC GCGTGGTGCC GGGCTGCGTG 78 0 

ACCCAGAGCC AGGCCGACTG GGACCGCGCT CGGCGCCGGC GCACCTTCTG CTTGCTGGTG 84 0 
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GTGGTCGT_-G TGGTGTTCGC C3TCTGCTGG CTGCCGCTGC ACGTCTTCAA CCTGCTGCGG 9 00 

GACCTCG/VL-C CCCACGCCAT CGACCCTTAC GCGTTTGGGC TGGTGCAGCT GCTCTGCCAC 96 0 

TGGCTCGCCA TGAGTTCGGC CTGCTACAAC CCCTTCATCT ACGGCTGGCT GCACGACAGC 102 0 

TTCCGCGAGG AGCTGCGCAA AGTGTTGGTC GCTTGGCCCC GGA/^GATAGC GCCCGATGGC lOSO 

CAGAATATGA CCGTCAGCGT GGTCATGTGA illO 

(27) INFORMATION FOR SEQ ID NO: 26: 

(i: SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3G9 amino acids 

(D^ T^'PE : amino acid 

{Cj STRANDEDNESS : 

(D} TOPOLOGY: not relevant 

{i^> MOLECULE TYPE: protein 

(>-} SEQUENCE DESCRIPTION: SEQ ID NO:26: 

Met Ala Ser Ser Thr Thr Arg Gly Pro Arg Val Ser Asp Leu Phe Ser 
15 10 15 

Gly Leu Pro Pro Ala Val Thr Thr Pro Ala Asn Gin Ser Ala Glu Ala 
20 25 30 

Ser Ala Gly Ann Gly Ser Val Ala Gly Ala Asp Ala Pro Ala Val Thr 
3b 4 0 4 5 

Pro Phe Gin Ser Leu Gin l^eu Val Kis Gin Leu Lys Gly Leu He Val 
50 55 60 

Leu Leu Tyr Ser Val Val Val Val Val Gly Leu Val Gly Asn Cys Leu 
65 70 75 8 0 

Leu Val Leu Val He Ala Arg Val Pro Arg Leu His Asn Val Thr Asn 
85 90 95 

Phe Leu He Gly Asn Leu Ala Leu Ser Asp Val Leu Met Cys Thr Ala 
100 .1 05 110 

Cys Val Pro Leu Thr Leu Ala Tyr Ala Phe Glu Pro Arg Gly Trp Val 
115 120 125 

Phe Gly Gly Gly Leu Cys His Leu Val Phe Phe Leu Gin Pro Val Thr 
13 0 13 5 14 0 

Val Tyr Val Scr Val Phe Thr Leu Thr Thr He Ala Val Asp Arg Tyr 
145 150 155 160 

Val Val Leu Val His Pro Leu Arg Arg Ala Ser Arg Cys Ala Ser Ala 
165 170 175 
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Tyi Ala Vdl Leu Aia lie Trp Ala Leu Ser Ala Val Leu Ala Leu Pro 
100 185 lyo 

Pro Ala Val His Thr Tyr His Val Glu Leu Lys Pro His Asp Val Arg 
195 200 2C5 

Leu Cys Glu Glu Phe Trp Gly Ser Gin Glu Arg Gin Arg Gin Leu Tyr 
210 215 220 

Ala Trp Gly Leu Leu Leu Val Thr Tyr Leu Leu Pro Leu Leu Val lie 
225 230 235 240 

Leu Leu Ser Tyr Val Arq Val Ser Val Lys Leu Arg Asn Arg Val Val 
245 250 255 

Pro Gly Cys Val Thr Gin Ser Gin Ala Asp Trp Acp Arg Ala Arg Arg 
26 0 26 5 2 70 

Arg Arg Thr Phe Cys Leu Leu Val Val Val Val Val Val Phe Ala Val 
275 280 285 

Cys Trp Leu Pro Leu His Val Phe Asn Leu Leu Arg Ar,p Leu Asp Pro 
290 295 300 

His Ala lie Asp Pro Tyr Ala Phe Q] y Leu Val Gin Leu Leu Cys His 
305 310 315 320 

Trp Leu Ala Met Ser Ser Ala Cys Tyi Asn Pro Phe lie Tyr Ala Trp 
32 5 3 30 33 5 

Leu s Asp Ser Phe Arg C^u Glu Leu Arg Lys Leu Leu Val Ala Trp 
3 40 3 45 350 

Pro Arg Lys lie Ala Pro His Gly Glii Asn Mer Thr Val Ser Val Val 
355 360 365 

lie 



(28) INFORMATION FOR SEQ ID KG : 2 7 : 

{!) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 108 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DWA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
ATGGACCCAG AAGAAACTTC AGTTTATTTG GATTATTACT ATGCTACGAG CCCAAACTCT 6 0 

GACATCAGGG AGACCCACTC CCATGTTCCT TACACCTCTG TCTTCCTTCC AGTCTTTTAC 120 
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ACAGCTGTGT TCCTGACTGG AGTGCTGGGG AACCT7*GTTC TCAT:^nGAGC GTTGCATTTC 18 0 

AAACCCGGCA GCCGAAGACT GATCGACATC TTTATCATCA ATCTGGCTGC CTC TGACTTC 24 0 

ATTTTTCTT3 TCACATTGCC TCTCTGG3TG GATAAAGAAG CATCTCTAGG A^TGTGGAGG J 00 

ACGGGCTCCT TCCTGTGCAA AGCGAGCTCC TACATGATCT CCGTGAATAT 3GACTGCAGT 36 0 

GTCCTCCTGC TCACTTGCAT GAGTGTTGAC CGCTACCTGG CCATTGTGTG GGCAGTCGTA 420 

TCCAGGAAAT TCAGAAGGAC AGACTGTGCA TATGTAGTCT GTGCCAGCAT CTGGTTTATC 4 80 

TCCTGCCTGC TGGGGTTGCC TACTCTTCTG TCCAGGGA3C TCACGCTGAT TGATGATAAG 54 0 

CCATACTGTG CAGAGAAAAA GGCAACTCCA ATTAAACT^A TATGGTCCCT GGTGGCCTTA. 6 00 

ATTTTCACCT TTTTTGTCCC TTTGTTGAGC ATTGTGACCT GCTACTGTTG CATTGCAAGG 660 

AAGCTGTGTG CCCATTACCA GCAATGAGGA AAGCACAACA AAA^^^GCTGAA GAAATCTATA 720 

AAGATCATCT TT/iTTGTCGT GGCAGCCTTT CTTGTCTCCT GGCTGCCCTT C/vATACTTTC 7 80 

AAGTTCCTGG CCATTGTCTC TGGGTTGCGG CTJ^GAACAG r ATTTACCGTC AGCTATTCTT 84 0 

CAGCTTGGTA TGG7iGGTGAG TGGACCCTTG GCATTTGC-'A ACAGCTGTGT CAACCCTTTC 9 00 

ATTTACTATA TCTTCGACAG CTACATCCGC CGGGCCATTG TCCA'^TGrT^ GTGCCCTTGC 96 0 

CTGAAAAACT ATGACTTTGG GAGTAGCACT GAGACATCAG ATAGTCACCT CACTAAGGCT 102 0 

CTCTCCACCT TCATTCATGC AGAAGATTTT GCCAGGAG:^A GG7W\GAGGTG TGTGTCACTC 108 0 

TAA 1083 
(29) IMFORMATIOrJ FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Asp Pro Glu Glu Thr £er VaJ Tyr Leu Asp Tyr I'Y^ -^1^ Thr 

15 10 15 

Ser Pro Asn Ser Asp lie Arg Glu Thr His Ser His Val Pro Tyr Thr 
20 2S 30 

Ser Val Phe Leu Pro Val Pho Tyr Thr Ala Val Phe Leu Thr Gly Val 
35 40 45 
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Leu '^ly Asn Leu Val Leu Met Gly Ala Leu Hin Phe I.ys Pro Gly Ser 
50 55 60 

Arg Arg Leu lie Asp lie Phc Lie lie Asn Leu Ala Ala Ser Asp Pho 
65 70 75 80 

11^? Phe Leu Val Thr Leu Pro Leu Trp Val Ar;p Lys GIu Ala Ser Leu 
85 90 95 

Gly Lou Trp Arg Tnr Gly Scr Phe Leu Cys Lye Gly Ser Ser Tyr Ket 
100 105 110 

He Ser Val Asn Met His Cyc Ser Val Leu Leu Leu Thr Cys Met Ser 
115 120 125 

Val A.qp Arg Tyr Leu Ala He Val Trp Pre Val Val Ser Arg Lys Phe 
130 13 5 14 0 

Arg Arg 'Ihr Asp Cys Ala ^T'yr Val Val Cys Ala Ser He Trp Phe He 
14b 150 155 J60 

Ser Cys Leu Leu Gly Leu Pro Thr Leu Leu Scr Arg Glu Leu Thr Leu 
165 170 175 

He Anp Asp Lys Pro Tyr Cys Ala Glu Lys Lys Ala Thr Pro He Lys 
180 185 190 

Leu He Trp Ser Leu Val Ala Leu He Pne Thr Phe Phe Val Pro Leu 
195 200 205 

Leu Scr He Val Thr Cys Tyr Cys Cys He Ala Arg Lys Leu Cys Ala 
210 215 220 

His Tyr Gin Gin Ser Gly Lys His Asn Lys Lys Leu Lys Lys Ser He 
22 5 230 23 5 24 0 

Lys He He Phe He Val Val Ala Ala Phe Leu Val Ser Trp Leu Pro 
245 250 255 

Phe Asn Thr Phc Lys Phe Leu Ala He Val Ser Gly Leu Arg Gin Glu 
26 0 26 5 2 70 

His Tyr Leu Pro Ser Ala He Leu Gin Leu Gly Met Glu Val Ser Gly 
275 280 285 

Pro Leu Ala Phe Ala Asn Ser Cys Val Asn Pro Phe He Tyr Tyr He 
290 295 300 

Phe Asp Ser Tyr He Arg Arg Ala He Val His Cys Leu Cys Pro Cys 
305 310 315 320 

Leu Lys Asn Tyr Asp Phe Gly Ser Ser Thr Glu Thr Ser Asp Ser His 
325 330 335 



Leu Tiir Lys Ala Leu Ser Thr Phc He His Ala Glu Asp Phe Ala Arg 
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:^ 4 0 3 ■:> o 

Arg Arg Ly^: Arg Ser Val 3er Leu 
355 360 

(.10) INFORMATION FOR SEQ ID UO:2^: 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic and 

(C) STRANDEDNESS : single 

(D) TOFOLOGY: linear 

(ii) MOLECULE TYPE: DNA ''genomic) 

(xi) SEQUENCE DESCRIPTIOri : SEQ ID MO : 2 9 : 
CTAGAATTCT GACTCCAGCC AAAGCATGAA T 31 
(.-31) INF0P11ATI0N FOR GEQ ID 30: 

(i) SEQUENCE CHARA.CTERISTICS : 

(A) LENGTH: 30 base pairc 

(B) TYPE nucleic acid 
;C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECIILE TYPE: DNA (genomic: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
GCTGGATCCT AAACAGTCTG CGCTCGGCCT 3 0 

(32) INFORMATION FOR SEQ ID NO : 3 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LEIIGTK: 102 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

ixi) SEQUENCE DESCRTPTIo:i : SEQ ID NO: 31: 

ATGAATGGCC TTGAAGTGGC TCCCCCAGGT CTGATCACCA ACTTCTCCCT GGCCACGGCA 6 0 

GAGCAATGTG GCCAGGAGAC GCCACTGGAG AACATGCT'IT TCGCCTCCTT CTACCTTCTG 12 0 

GATTTTATCC TGGCTTTAGT TGGCAATACC CTGGCTCTGT GGCTTTTCAT CCGAGACCAC 13 0 

AAGTCCGGGA CCCCGGCCAA CGTGTTCCTG ATGCATCTGG CCGTGGCCGA CTTGTCGTGC 24 0 

GTGCTGGTCC TGCCCACCCG CCTGGTCTAC CACTTCTCTG GGAACCACTG GCCATTTGGG 3 00 
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GAAATCGCAT GCCGTCTCAC CGGCTTCCTC TTCTACCTCA ACATGTACGC CAGCATCTAC i6 0 

TTCCTCACCT GCATCAGCGC CGACCGTTTC CTGGCCATTG TGCACCCGGT CAAGTCCCTC 420 

AAGCTCCGCA GGCCCCTCTA CGGACACCTG GCCTGTGCCT TCCTGTGGGT GGTGGTGGCT 4 80 

GTGGCCATGG CCCCGCTGCT GGTGAGCCCA CAGACCGTGC AGACCAACCA CACGGTGGTC 54 0 

TGCCTGCAGC TGT ACCGGGA GAAGGCCTCC CACCATGCCC TGGTGTCCCT GGCAGTGGCC 6 00 

TTCACCTTCC CGTTCATCAG CACGGTCACC TGCTACCTGC TGATCATCCG CAGCCTGCGG 66 0 

CAGGGCCTGC GTGTGGAGAA GCGCCTCAAG ACCAAGGCAG T<JCGCATGAT CGCCATAGTG 72 0 

CTGGCCATCT TCCTGGTCTG CTTCGTGCCC TACCACGTCA ACCGCTCCGT CTACGTGCTG 780 

CACTACCGCA GCCATGGGGC CTCCTGCGCC ACCCAGCGCA TCCTGGCCCT GGCAAACCGC 840 

ATCACCTCCT GCCTCACCAG CCTCAACGGG GCACTCGAGC CCATCATGTA TTTCTTCGTG 90 0 

GCTGAGAAGT TCCGCCACGC CCTGTGCAAC TTGCTCTGTG GCAi^^AAGGCT CAAGGGCCCG 96 0 

CCCCCCAGCT TCGAAGGGAA AACCAACGAG AGCTCGCTGA GTGCCAAGTC AGAGCTGTGA 1020 
(33) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CH/O^CTERISTICS : 

(A) LENGTH: 339 ammo acids 

(B) TYPE: amino acid 
( C ) STRAJJDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32. 

Met Asn Gly Leu Glu Val Ala Pro Pro Gly Leu lie Thr Asn Phe Ser 
15 10 15 

Leu Ala Thr Ala Glu Gin Cys Gly Gin Glu Thr Pro Leu Glu Asn Met 
20 25 30 

Leu Phe Ala Ser Phe Tyr Leu Leu Asp Phe lie Leu Ala Leu Val Gly 
35 40 45 

Asn Thr Leu Ala Leu Trp Leu Phe He Arg Asp His Lys Ser Gly Thr 
50 55 60 

Pro Ala Asn Val Phe Leu Met His Leu Ala Val Ala Asp Lou Ser Cys 
65 7 0 75 80 

Val Leu Val Leu Pro Thr Arg Leu Val Tyr Kis Phe Ser Gly Asn His 
85 90 95 

Trp Pro Phe Gly Glu lie Ala Cys Arg Leu Thr Gly Phe Leu Phe Tyr 
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100 105 110 

Lc\. Asn Met Tyr A] a Ser lie Tyr Pl^e ueu Thr Cys lie Ser Ala Asp 
115 120 125 

Ary Phe Leu Ala He Val His Pro Val Lys Ser Leu Lys Leu Arg Arg 
130 135 140 

Pre Leu Tyr Ala His Leu Ala Cys Ala Vhc. hau Trp Val Val Val Ala 
145 150 155 160 

Val Ala Met Ala Pro Leu Leu Val Ser Pro 3ln Thr Val Gin Thr Asn 
16 5 170 175 

Hie; Thr Val Val Cys Leu Gin Leu Tyr Arg Glu Lys Ala Ser His His 
180 185 190 

Ala Leu Val Ser Leu Ala Val Ala Phe Thr Phe Pro Phe lie Thr Thr 
195 200 205 

Val Thr Cys Tyr Leu Leu He lie Arg Ser Leu Arg Gin Gly Leu Arg 
210 215 220 

Val Glu Lys Arg Leu Lys Thr Lys Ala Val Arg Met He Ala lie Val 
225 230 235 240 

Leu Ala He Phe Lea Val Cys Phe Val Pro Tyr tirs Val Asn Arg Ser 
245 250 255 

Val Tyr Val Leu His Tyr Arg Ser His Gly Ala Ser Cys Ala Thr Gin 
260 265 270 

Arg He Leu Ala Leu Ala Asa Arg He Thr Ser Cys Leu Thr Ser Leu 
27 5 280 285 

Asn Gly Ala Leu Asp Pro He Met Tyr Phe Phe Val Ala Glu Lys Phe 
290 295 300 

Arg His Ala Leu Cys Asn Leu Leu Cys Gly Lys Arg Leu Lys Gly Pro 
305 310 315 320 

Pro Pro Ser Phe Glu Gly Lys Thr Asn Glu Ser Ser Leu Ser Ala Lys 
325 330 335 

Ser Glu Leu 

(34) INFORMATION FOR SEQ ID NO: 33: 

(l) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 29 base pairs 
iB) TYPE: nucleic acid 
{C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



wo 00/22129 



PCT/IIS99/23938 



28 

(xi) SEQUENCE DESCRITTION: SEQ ID NO : 3 3 ; 
ATAAGATGAT CACCCTGAAC AATCAAGAT 2 9 

(3 5} INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
rCCGAATTCA TAACATTTCA CTGTTTATAT TGC 33 
(3G) INFORMATION FOR SEQ ID NO : 3 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 996 base pairs 

(B) TYPE; nucleic acid 
iC) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic} 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

ATGATCACCC TGAACAATCA AGATCAACCT GTCACTTTTA ACAGCTCACA TCChGATGAA 6 0 

TACAAAATTG CAGCCCTTGT CTTCTATAGC TGTATCTTCA TAATTGGATT ATTTGTTAAC 12 0 

ATCACTGCAT TATGGGTTTT CAGTTGTACC ACCAAGAAGA GAACCACGGT AACCATCTAT 190 

ATGATGAATG TGGCATTAGT GGACTTGATA TTTATAATGA CTTTACCCTT TCGAATGTTT 24 0 

TATTATGCAA AAGATGCATG GCCATTTGGA GAGTACTTCT GCCAGATTAT TGGAGCTCTC 3 00 

ACAGTGTTTT ACCCAAGCAT TGCTTTATGG CTTCTTGCCT TTATTAGTGC TGACAGATAC 36 0 

ATGGCCATTG TACAGCCGAA GTACGCCAAA GAACTTAAAA ACACGTGCAA AGCCGTGCTG 4 20 

GCGTGTGTGG GAGTCTGGAT AATGACCCTG ACCACGACCA CCCCTCTGCT ACTGCTCTAT 4 80 

AAAGACCCAG ATAAAGACTC CACTCCCGCC ACCTGCCTCA AGATTTCTGA CATCATCTAT 54 0 

CTAAAAGCTG TGAACGTGCT GAACCTCACT CGACTGACAT TTTTTTTCTT GATTCCTTTG 6 00 

TTCATCATGA TTGGGTGCTA CTTGGTCATT ATTCATAATC TCCTTCACGG CAGGACGTCT 660 

AAGCTGAAAC CCAAAGTCAA GGAGAAGTCC ATAAGGATCA TCATCACGCT GCTGGTGCAG 72 0 



wo 00/22129 PCT/tlS99/23938 

29 

GTGCTCGTCT GCTTTATGCC CTTCCACATC TGTTTCGCTT TCCTGATGCT GGGAAGGGGC 76 0 

GAGAACAGTT ACAATCCCTG GGGAGCCTTT ACCACCTTCC TCATGAACCT CAGCACGTGT 84 0 

CTGGATGTGA TTCTCTACTA CATCGTTTCA AAACAATTTC AGGCTCGAGT CATTAGTGTC 900 

ATGCTATACC GTAATTACCT TCGAAGCCTG CGCAGAAAAA GTTTCCGATC TGGTAGTCTA 960 

AGGTCACTAA GCAATATAAA CAGTGAAATG TTATGA 995 
(37) INFORMATION FOR SEQ ID NO: 36; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 331 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met He Thr Leu Asn Asn Gin Asp Gin Pro Val Thr Phe Asn Ser Ser 
1 5 10 15 

His Pro Asp Glu Tyr Lys He Ala Tila Leu Val Phe Tyr Ser Cys He 
20 25 30 

Phe He He Gly Leu Phe Val Asn lie Thr Ala Leu Trp Val Phe Ser 
35 40 45 

Cys Thr Thr Lys Lys Arg Thr Thr Val Thr He Tyr Met Met Asn Va] 
50 55 60 

Ala Leu Val Asp Leu He Phe He Met Thr Leu Pro Phe Arg Met Phe 
65 70 75 6C 

Tyr Tyr Ala Lys Asp Ala Trp Pro Phe Giy Glu Tyr Phe Cys Gin He 
85 90 95 

He Gly Ala Leu Thr Val Phe Tyr Pro Ser He Ala Leu Trp Leu Leu 
100 105 110 

Ala Phe He Ser Ala Asp Arg Tyr Met Ala He Val Gin Pro Lys Tyr 
115 120 125 

Ala Lys Glu Leu Lys Asn Thr Cys Lys Ala Val Leu Ala Cys Val Gly 
130 135 140 

Val Trp He Met Thr Leu Thr Thr Thr Thr Pro Leu Leu Leu Leu Tyr 
145 ISO 155 160 

Lys Asp Pro Asp Lys Asp Ser Thr Pro Ala Thr Cys Leu Lys He Ser 
165 170 175 
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Asp lie lie Tyr Leu Lys Ala Val Asn Val Leu Asn Leu Thr Arg Leu 
180 185 190 

Thr Phe Phe Phe Leu lie Pro Leu Phe lie Met He Gly Cys Tyr Leu 
195 200 205 

Val He He His Asn Leu Leu His Gly Arg Thr Ser Lys Leu Lys Pro 
210 215 220 

Lys Val Lys Glu Lys Ser He Arg He lie He Thr Leu Leu Val Gin 
225 230 235 240 

Val Leu Val Cys Phe Met Pro Phe His He Cys Phe Ala Phe Leu Met 
245 250 255 

Leu Gly Thr Gly Glu Asn Ser Tyr Asn Pro Ti~p Gly Ala Phe Thr Thr 
260 265 270 

Phe Leu Met Asn Leu Ser Thr Cys Leu Asp Val He Leu Tyr Tyr He 
275 280 285 

Val Ser Lys Gin Phe Gin Ala Arg Val He Ger Val Met Leu Tyr Arg 
290 295 300 

Asn Tyr Leu Arg Ser Leu Arg Arg Lys Ser Phe Arg Ser Gly Ser Leu 

305 310 315 320 

Arg Ser Leu Ser Asn He Asn Ser Met Leu 

325 330 

(38) INFORMATION FOR SEQ ID NO: 37: 

(i] SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 7 : 
CCAAGCTTCC AGGCCTGGGG TGTGCTGG 2 8 

(3 9) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO ; 3 8 ; 
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ATGGATCCTG ACCTTCGGCC CCTGGCAGA 2 9 
(4 0) IWFORT-IATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1077 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECITLE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 : 

ATGCCCTCTG TGTCTCCAGC GGGGCCCTCG GCCGGGGCAG TCCCCAATGC CACCGCAGTG 60 

ACAACAGTGC GGACCAATGC CAGCGGGCTG GAGGTGCCCC TGTTCCACCT GTTTGCCCGG 12 0 

CTGGACGAGG AGCTGCATGG CACCTTCCCA GGCCTGTGCG TGGCGCTGAT GGCGGTGCAC 18 0 

GGAGCCATCT TCCTGGCAGG GCTGGTGCTC AACGGGCTGG CGCTGTACGT CTTCTGCTGC 24 0 

CGCACCCGGG CCAAGACACC CTCAGTCATC TACACCATCA ACCTGGTGGT GACCGATCTA 30 0 

CTGGTAGGGC TGTCCCTGCC CACGCGCTTC GCTGTGTACT ACGGCGCCAG GGGCTGCCTG 36 0 

CGCTGTGCCT TCCCGCACGT CCTCGGTTAC TTCCTCAACA TGCACTGCTC CATCCTCTTC 42 0 

CTCACCTGCA TCTGCGTGGA CCGCTACCTG GCCATCGTGC GGCCCGAAGG CTCCCGCCGC 4 80 

TGCCGCCAGC CTGCCTGTGC CAGGGCCGTG TGCGCCTTCG TGTGGCTGGC CGCCGGTGCC 54 0 

GTCACCCTGT CGGTGCTGGG CGTGACAGGC AGCCGGCCCT GCTGCCGTGT CTTTGCGCTG 6 00 

ACTGTCCTGG AGTTCCTGCT GCCCCTGCTG GTCATCAGCG TGTTTACCGG CCGCATCATG 6 60 

TGTGCACTGT CGCGGCCGGG TCTGCTCCAC CAGGGTCGCC AGCGCCGCGT GCGGGCCATG 72 0 

CAGCTCCTGC TCACGGTGCT CATCATCTTT CTCGTCTGCT TCACGCCCTT CCACGCCCGC 78 0 

CAAGTGGCCG TGGCGCTGTG GCCCGACATG CCACACCACA CGAGCCTCGT GGTCTACCAC 84 0 

GTGGCCGTGA CCCTCAGCAG CCTCAACAGC TGCATGGACC CCATCGTCTA CTGCTTCGTC 900 

ACCAGTGGCT TCCAGGCCAC CGTCCGAGGC CTCTTCGGCC AGCACGGAGA GCGTGAGCCC 96 0 

AGCAGCGGTG ACGTGGTCAG CATGCACAGG AGCTCCAAGG GCTCAGGCCG TCATCACATC 102 0 

CTCAGTGCCG GCCCTCACGC CCTCACCCAG GCCCTGGCTA ATGGGCCCGA GGCTTAG 107 7 

(41) INFORMATION FOR SEQ ID NO:40; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 8 amino acids 
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(B) TYPE: amino acid 

(CJ STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

iii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Met Pro Ser Val Ser Pro Ala Gly Pro Ser Ala Gly Ala Val Pro Asn 
1 5 10 15 

Ala Thr Ala Val Thr Thr Val Arg Thr Asn Ala Ser Gly Leu Glu Val 
20 25 30 

Pro Leu Phe His Leu Phe Ala Arg Leu Asp Glu Glu Leu His Gly Thr 
35 40 4 5 

Phe Pro Gly Leu Cys Val Ala Leu Ket Ala Val His Gly Ala lie Phe 
50 55 60 

Leu Ala Gly Leu Val Leu Asn Gly Leu Ala Leu T>'r Val Phe Cys Cys 
65 70 75 80 

Arg Thr Arg Ala Lye Thr Pro Ser Val lie Tyr Thr He Asn Leu Val 
85 90 95 

Val Thr Asp Leu Leu Val Gly Leu Ser Leu Pro Thr Arg Phe Ala Val 
100 105 110 

•lyr Tyr Gly Ala Arg Gly Cys Leu Arg Cys Ala Phe Pro His Val Leu 
115 120 125 

Gly Tyr Phe Leu Asn Met His Cys Ser He Leu Phe Leu Thr Cys lie 
130 135 140 

Cys Val Asp Arg Tyr Leu Ala He Va] Arg Pro Glu Ala Pro Ala Ala 
145 150 155 160 

Cys Arg Gin Pro Ala Cys Ala Arg Ala Val Cys Ala Phe Val Trp Leu 
165 170 175 

Ala Ala Gly Ala Val Thr Leu Ser Val Leu Gly Val Thr Gly Ser Arg 
180 185 190 

Pro Cys Cys Arg Val Phe Ala Leu Thr Val Leu Glu Phe Leu Leu Pro 
195 200 2D5 

Leu Leu Val He Ser Val Phe Thr Gly Arg lie Met Cys Ala Leu Ser 
210 215 220 

Arg Pro Gly Leu Leu His Gin Gly Arg Gin Arg Arg Val Arg Ala Met 
225 230 235 ^ 240 

Gin Leu Leu Leu Thr Val Leu He He Phe Leu Val Cys Phe Thr Pro 
245 250 255 
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Phe His Ala Arg Gin Val Ala Val Ala Leu Trp Pro Asp Met Pro His 
260 265 270 

Hrs Thz- Ser Leu Val Val Tyr His Val Ala Val Thr Leu Ser Eer Leu 
275 280 285 

Asa Ser Cys Met Asp Pro He Val Tyr Cys Phe Val Thr Ser Gly Phe 
290 295 300 

Gin Ala Thr Val Arg Gly Leu Phe Gly Gin His Gly Glu Arg Glu Pro 
305 310 315 320 

Ser Ser Gly Asp Val Val Ser Met H±r. Arg Ser Ser Lys Gly Ser Gly 
325 330 335 

Arg His His He Leu Ser Ala Gly Pro His Ala Leu Thr Gin Ala Leu 
340 345 350 

Ala Asn Gly Pro Glu Ala 

355 

(42) INFORT-IATION FOR SEQ ID NO: 41: 

( 1 ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:41: 
GAGAATTCAC TCCTGAGCTC AAGATGAACT 

(43) INFORMATION FOR SEQ ID NO:42: 

[i] SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{11 ) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
CGGGATCCCC GTAACTGAGC CACTTCAGAT 

(44) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1050 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ar) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRTFTTON: SEQ ID NO: 43: 

ATGAACTCCA CCTTGGATGG TAATCAGAGC AGCCACCCTT TTTGCCTCTT GGCATTTGGC 6 0 

TATTTGGAAA CTGTCAATTT TTGCCTTTTG GAAGTATTGA TTATTGTCTT TCTAACTGTA 120 

TTGATTATTT CTGGCAACAT CATTGTGATT TTTGTATTTC ACTGTGCACC TTTGTTCAAC IBO 

CATCACACTA CAAGTTATTT TATCCAGACT ATGGCATATG CTGACCTTTT TGTTGGGGTG 240 

AGCTGCGTGG TCCCTTCTTT ATCACTCCTC CATCACCCCC TTCCAGTAGA GGAGTCCTTG 3 00 

ACTTGCCAGA TATTTGGTTT TGTAGTATCA GTTCTGAAGA GCGTCTCCAT GGCTTCTCTG 3 60 

GCCTGTATCA GCATTGATAG ATACATTGCC ATTACTAAAC CTTTAACCTA TAATACTCTG 4 20 

GTTACACCCT GGAGACTACG CCTGTGTATT TTCCTGATTT GGCTATACTC GACCCTGGTC 4 80 

TTCCTGCCTT CCTTTTTCCA CTGGGGCAAA CCTGGATATC ATGGAGATGT GTTTGAGTGG 54 0 

TGTGCGGAGT CCTGGCACAC CGACTCCTAC TTCACCCTGT TCATCGTGAT GATGTTATAT 6 00 

GCCCCAGCAG CCCTTATTGT CTGCTTCACC TATTTCAACA TCTTCCGCAT CTGCCAACAG 66 0 

CACACAAAGG ATATCAGCGA AAGGCAAGCC CGCTTCAGCA GCCAGAGTGG GGAGACTGGG 72 0 

GAAGTGCAGG CCTGTCCTGA TAAGCGCTAT GCCATGGTCC TGTTTCGAAT CACTAGTGTA 7R0 

TTTTACATCC TCTGGTTGCC ATATATCATC TACTTCTTGT TGGAAAGCTC CACTGGCCAC 84 0 

AGCAACCGCT TCGCATCCTT CTTGACCACC TGGCTTGCTA TTAGTAACAG TTTCTGCAAC 9 00 

TGTGTAATTT ATAGTCTCTC CAACAGTGTA TTCCAAAGAG GACTAAAGCG CCTCTCAGGG 960 

GCTATGTGTA CTTCTTGTGC AAGTCAGACT ACAGCCAACG ACCCTTACAC AGTTAGAAGC 102 0 

AAAGGCCCTC TTAATGGATG TCATATCTGA 105 0 
(45) INFORI'iATION FOR SEg ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 349 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
Met Asn Ser Thr Leu Anp Gly Asn Gin Ser Ser His Pro Phe Cys Leu 
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L.eu Ala Phe Gly Tyr Leu Glu Thr Val Asn Phe Cys Leu Leu Glu Val 
20 25 30 

Leu lie He Val Phe Leu Thr Val Leu He He Ser Gly Asn He He 
35 40 45 

Val He Phe Val Phe His Cys Ala Pro Leu Leu Asn His His Thr Thr 
50 55 60 

Ser Tyr Phe He Gin Thr Met Ala Tyr Ala Aep Leu Phe Val Gly Val 

65 70 75 80 

Ser Cys Val Val Pro Ser Leu Ser Leu Leu His His Pro Leu Pro Val 
8B 90 95 

Glu Glu Ser Leu Thr Cys Gin He Phe Gly Phe Val Val Ser Val Leu 
100 105 110 

Lys Ser Val Ser Met Ala Ser Leu Ala Cys He Ser He Asp Arg Tyr 
115 120 125 

lie Ala He Thr Lys Pro Leu Thr Tyr Asn Thr Leu Val Thr Pro Trp 
130 135 140 

Arg Leu Arg Leu Cys He Phe Leu He Trp Leu Tyr Ser Thr Leu Vai 
145 ISO 155 160 

Phe Leu Pro Ser Phe Phe His Trp Gly Lyii Pro Gly Tyr His Gly Asp 
165 170 175 

Val Phe Gin Trp Cys Ala Glu Ser Trp His Thr Asp Ser Tyr Phe Thr 
180 185 190 

Leu Phe He Val Met Met Leu Tyr Ala Pro Ala Ala Leu He Val Cys 
195 200 205 

Phe Thr Tyr Phe Asn He Phe Arg He Cys Gin Gin His Thr Lys Asp 
210 215 220 

He Ser Glu Arg Gin Ala Arg Phe Ser Ser Gin Ser Gly Glu Thr Gly 
225 230 235 240 

Glu Val Gin Ala Cys Pro Asp Lys Arg Tyr Ala Met Val Leu Phe Arg 
245 250 255 

He Thr Ser Val Phe Tyr He Leu Trp Leu Pro Tyr He He Tyr Phe 
260 265 270 

Leu Leu Glu Ser Ser Thr Gly His Ser Asn Arg Phe Ala Ser Phe Leu 
275 280 285 



Thr Thr Trp Leu Ala He Ser Asn Ser Phe Cys Asn Cys Val He Tyr 
290 295 300 
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Ser I.eu Ser Asn 



Ser 



Val Phe Gin 



Arg Gly Leu Ly.s Arg Leu Ser Gly 
315 320 



305 



310 



Ala Met Cys Thr 



Ser 
325 



Cys Ala Ser 



Gin Thr Thr Ala Asn Asp Pro Tyr 

330 335 



Thr Val Arg Ser 
340 



Lys 



Gly Pro Leu 



Asn Gly Cys His He 
34S 



(46} INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CflARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TCCCCCGGGA AAAAAACCAA CTGCTCCAAA 3 0 

(47) INFORMATION FOR SEQ ID NO: 46: 

(:L) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genome) 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 46: 
TAGGATCCAT TTGAATGTGG ATTTGGTGAA A 31 
(4 8) INFORMATION FOR SEQ ID NO: 47: 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1302 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 7 : 

ATGTGTTTTT CTCCCATTCT GGAAATCAAC ATGCAGTCTG AATCTAACAT TACAGTGCGA 6 0 

GATGACATTG ATGACATCAA CACCAATATG TACCAACCAC TATCATATCC GTTAAGCTTT 12 0 

CAAGTGTCTC TCACCGGATT TCTTATGTTA GAAATTGTGT TGGGACTTGG CAGCAACCTC 18 0 
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ACTGTATTGG TACTTTACTG CATGAAATCC AACTTAATCA ACTCTGTCAG TAACATTATT 24 0 

ACAATGAATC TTCATGTACT TGATGTAATA ATTTGTGTGG GATGTATTCC TCTAACTATA 3 00 

GTTATCCTTC TGCTTTCACT GGAGAGTAAC ACTGCTCTCA TTTGCTGTTT CCATGAGGCT 36 0 

TGTGTATCTT TTGCAAGTGT CTCAACAGCA ATCAACGTTT TTGCTATCAC TTTGGACAGA 420 

TATGACATCT CTGTAAAACC TGCAAACCGA ATTCTGACAA TGGGCAGAGC TGTAATGTTA 4 80 

ATGATATCCA TTTGGATTTT TTCTTTTTTC TCTTTCCTGA TTCCTTTTAT TGAGGTAAAT 54 0 

TTTTTCAGTC TTCAAAGTGG AAATACCTGG GAAAACAAGA CACTTTTATG TGTCAGTACA 6 00 

AATGAATACT ACACTGAACT GGGAATGTAT TATCACCTGT TAGTACAGAT CCCAATATTC 66 0 

TTTTTCACTG TTGTAGTAAT GTTAATCACA TACACCAAAA TACTTCAGGC TCTTAATATT 72 0 

CGAATAGGCA CAAGATTTTC AACAGGGCAG AAGAAGAAAG CAAGAAAGAA AAAGACAATT ^pQ 

TCTCTAACCA CACAACATGA GGCTACAGAC ATGTCACAAA GCAGTGGTGG GAGAAATGTA 84 0 

GTCTTTGGTG TAAGAACTTC AGTTTCTGTA ATAATTGCCC TCCGGCGAGC TGTGAAACGA 9 00 

CACCGTGAAC GACGAGAAAG ACAAAAGAGA GTCTTCAGGA TGTCTTTATT GATTATTTCT 96 0 

ACATTTCTTC TCTGCTGGAC ACCAATTTCT GTTTTAAATA CCACCATTTT ATGTTTAGGC 102 0 

CCAAGTGACC TTTTAGTAAA ATTAAGATTG TGTTTTTTAG TCATGGCTTA TGGAACAACT 1080 

ATATTTCACC CTCTATTATA TGCATTCACT AGACAAAAAT TTCAAAAGGT CTTGAAAAGT 114 0 

AAAATGAAAA AGCGAGTTGT TTCTATAGTA GAAGCTGATC CCCTGCCTAA TAATGCTGTA 12 0 0 

ATACACAACT CTTGGATAGA TCCCAAAAGA AACAAAAAAA TTACCTTTGA AGATAGTGAA 12 6 0 

ATAAGAGAAA AACGTTTAGT GCCTCAGGTT GTCACAGACT AG 13 02 
(49) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 33 amino acxds 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48; 

Met Cys Phe Ser Pro lie Leu Glu lie Asn Met Gin Ser Glu Ser Asn 
15 10 15 

lie Thr Val Arg Asp Asp lie Asp Asp lie Asn Thr Asn Met Tyr Gin 
20 25 30 
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Pro Leu Ser Tyr 
35 

Met Leu Glu He 
50 

Leu Tyr Cys Met 
65 

Thr Met Asn Leu 



Pro Leu Thr He 
100 

Leu He Cys Cys 
115 

Thr Ala He Asn 
130 

Val Lys Pro Ala 
145 

Met He Ser He 



Pro Leu Ser Phe 
40 

Val Leu Gly Leu 
55 

Lys Ser Asn Leu 
70 

His Val Leu Asp 
85 

Val He Leu Leu 



Phe His Glu Ala 
120 

Val Phe Ala He 
135 

Asn Arg He Leu 
150 

Trp He Phe Ser 
165 



Gin Val Ser Leu 



Gly Ser Asn Leu 
60 

He Ann Ser Val 
75 

Val He He Cys 
90 

Leu Ser Leu Glu 

105 

Cys Val Ser Phe 



Thr Leu Asp Arg 
140 

Thr Met Gly Arg 
155 

Phe Phe Ser Phe 
170 



Thr Gly Phe Leu 
45 

Thr Val Leu Val 



Ser Asn He He 
80 

Val Gly Cys He 
95 

Ser Asn Thr Ala 
110 

Ala Ser Val Ser 
125 

Tyr Asp He Ser 



Ala Val Met Leu 
160 

Leu lie Pro Phe 
175 



He Glu Val Asn 
180 

Lys Thr Leu Leu 
1 95 

Met Tyr Tyr His 
210 

Val Val Met Leu 

225 

Arg He Gly Thr 



Phe Phe Ser Leu 



Cys Val Ser Thr 
200 

Leu Leu Val Gin 
215 

He Thr Tyr Thr 
230 

Arg Phe Ser Thr 
245 



Gin Ser Gly Asn 

185 

Asn Glu Tyr Tyr 



He Pro He Phe 

220 

Lys He Leu Gin 
235 

Gly Gin Lys Lys 
250 



Thr Trp Glu Asn 
190 

Thr Glu Leu Gly 
205 

Phe Phe Thr Val 



Ala Leu Asn He 
240 

Lys Ala Arg Lys 
255 



Lys Lys Thr He Ser Leu Thr Thr Gin His Glu Ala Thr Asp Met Ser 
260 265 270 

Gin Ser Ser Gly Gly Arg Asn Val Val Phe Gly Val Arg Thr Ser Val 
275 280 285 

Ser Val He He Ala Leu Arg Arg Ala Val Lys Arg His Arg Glu Arg 
290 295 300 



Arg Glu Arg Gin Lys Arg Val Phe Arg Met Ser Leu Leu He He Ser 
305 310 315 320 

Thr Phe Leu Leu Cys Trp Thr Pro He Ser Val Leu Asn Thr Thr He 
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325 330 335 

Leu Cys Leu Gly Pro Ser Asp Leu Leu Val Lyi; Leu Arg Leu Cys Phe 
340 345 350 

Leu Val Met Ala Tyr Gly Thr Thr lie Phe His Pro Leu Leu Tyr Ala 
355 360 365 

Phe Thr Arg Gin Lys Phe Gin Ly<5 Val Leu Lys Ser Lys Met Lys Lys 
370 375 380 

Arg Val Val Ser He Val Glu Ala Acp Pro Leu Pro Asn Asn Ala Val 
385 390 395 400 

lie His Asn Ser Trp He Asp Pro Lys Arg Ann Lys Lys He Thr Phe 
405 410 415 

Glu Asp Ser Glu He Arg Glu Lys Arg Leu Val Pro Gin Val Val Thr 
420 425 430 

Asp 



(50) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: DKA {genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GTGAAGCTTG CCTCTGGTGC CTGCAGGAGG 

(51) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
GCAGAATTCC CGGTGGCGTG TTGTGGTGCC C 

(52) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 120 9 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOL.OGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

ATGTTGTGTC CTTCCAAGAC AGATGGCTCA GGGCACTCTG GTAGGATTCA CCAGGAAACT 6 0 

CATGGAGAAG GGAAAAGGGA CAAGATTAGC AACAGTGAAG GGAGGGAGAA TGGTGGGAGA 12 0 

GGATTCCAGA TGAACGGTGG GTCGCTGGAG GCTGAGCATG CCAGCAGGAT GTCAGTTCTC IBO 

AGAGCAAAGC CCATGTCAAA CAGCCAACGC TTGCTCCTTC TGTCCCCAGG ATCACCTCCT 24 0 

CGCACGGGGA GCATCTCCTA CATCAACATC ATCATGCCTT CGGTGTTCGG CACCATCTGC 3 00 

CTCCTGGGCA TCATCGGGA.A CTCCACGGTC ATCTTCGCGG TCGTGAAGAA GTCCAAGCTG 36 0 

CACTGGTGCA ACAACGTCCC CGACATCTTC ATCATCT^CC TCTCGGTAGT AGATCTCCTC 42 0 

TTTCTCCTGG GCATGCCCTT CATGATCCAC CAGCTCATGG GCAATGGGGT GTGGCACTTT 48 0 

GGGGAGACCA TGTGCACCCT CATCACGGCC ATGGATGCCA ATAGTCAGTT CACCAGCACC 54 0 

TACATCCTGA CCGCCATGGC CATTGACCGC TACCTGGCCA CTGTCCACCC CATCTCTTCC 6 00 

ACGAAGTTCC GGAAGCCCTC TGTGGCCACC CTGGTGATCT GCCTCCTGTG GGCCCTCTCC 6 60 

TTCATCAGCA TCACCCCTGT GTGGCTGTAT GCCAGACTCA TCCCCTTCCC AGGAGGTGCA 720 

GTGGGCTGCG GCATACGCCT GCCCAACCCA GACACTGACC TCTACTGGTT CACCCTGTAC 76 0 

CAGTTTTTCC TGGCCTTTGC CCTGCCTTTT GTGGTCATCA CAGCCGCATA CGTGAGGATC 84 0 

CTGCAGCGCA TGACGTCCTC AGTGGCCCCC GCCTCCCAGC GCAGCATCCG GCTGCGGACA 90 0 

AAGAGGGTGA CCCGCACAGC CATCGCCATC TGTCTGGTCT TCTTTGTGTG CTGGGCACCC 96 0 

TACTATGTGC TACAGCTGAC CCAGTTGTCC ATCAGCCGCC CGACCCTCAC CTTTGTCTAC 102 0 

TTATACAATG CGGCCATCAG CTTGGGCTAT GCCAACAGCT GCCTCAACCC CTTTGTGTAC 108 0 

ATCGTGCTCT GTGAGACGTT CCGCAAACGC TTGGTCCTGT CGGTGAAGCC TGCAGCCCAG 1X4 0 

GGGCAGCTTC GCGCTGTCAG CAACGCTCAG ACGGCTGACG AGGAGAGGAC AGAAAGCAAA 12 0 0 

GGCACCTGA -j^20 9 
(53) INFORMATION FOR SEQ ID NO: 52; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 02 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : 

{D} TOPOLOGY: not relevant 

(ii) MOLKCULE TYPE; protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:b2: 

Met Leu Cys Pro Ser Lys Thr Asp Gly Ser Gly His Ser Gly Arg lie 
15 10 15 

His Gin Glu Thr His Gly Glu Gly Lys Arg Acp Lyg lie Ser Asn Ser 
20 25 30 

Glu Gly Arg Glu Asn Gly Gly Arg Gly Phe Gin Met Asn Gly Gly Ser 
35 40 45 

Leu Glu Ala Glu His Ala Ser Arg Mot Ser Val Leu Arg Ala Lys Pro 
50 55 60 

Met Ser Asn Ser Gin Arg Leu Leu Leu Leu Ser Pro Gly Ser Pro Pro 
6 5 70 7 5 8 0 

Arg Thr Gly Ser lie Ser Tyr lie Asn He Tie Met Pro Ser Val Phe 
85 90 95 

Gly Thr He Cys Leu Leu Gly He He Gly Asn Ser Thr Val He Phe 
100 105 110 

Ala Val Val Lys Lys Ser Lys Leu His Trp Cys Asn Asn Val Pro Asp 
115 120 125 

He Phe He He Asn Leu Ser Val Val Asp Leu Leu Phe Leu Leu Gly 
130 135 140 

Met Pro Phe Met He His Gin Leu Met Gly Asn Gly Val Trp His Phe 
145 150 155 160 

Gly Glu Thr Met Cys Thr Leu He Thr Ala Met Asp Ala Asn Ser Gin 
165 170 175 

Phe Thr Ser Thr Tyr He Leu Thr Ala Met Ala He Asp Arg Tyr Leu 
180 185 190 

Ala Thr Val His Pro He Ser Ser Thr Lys Phe Arg Lys Pro Ser Val 
195 200 205 

Ala Thr Leu Val He Cys Leu Leu Trp Ala Leu Ser Phe He Ser He 
210 215 220 

Thr Pro Val Trp Leu Tyr Ala Arg Leu He Pro Phe Pro Gly Gly Ala 
225 230 235 240 



Val Gly Cys Gly He Arg Leu Pro Asn Pro Asp Thr Asp Leu Tyr Trp 
245 250 255 
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Phe Thr Leu Tyr Gin Phe Phe Leu Ala Phe Ala Leu Pro Phe Val Val 
260 265 270 

lie Thr Ala Ala T^^r Val Arg He Leu Gin Arg Met Thr Ser Ser Val 
275 280 285 

Ala Pro Ala Ser Gin Arg Ser lie Arg Lgu Arg Thr Lys Arg Val Thr 
290 295 300 

Arg Thr Ala He Ala He Cys Leu Val Phe Phe Val Cys Trp Ala Pro 
306 310 315 320 

Tyr Tyr Val Leu Gin Leu Thr Gin Leu Ser He Ser Arg Pro Thr Leu 
325 330 335 

Thr Phe Val Tyr Leu Tyr Asn Ala Ala He Ser Leu Gly Tyr Ala Asn 
340 345 350 

Ser Cys Leu Acn Pro Phe Val Tyr He Val Leu Cys Glu Thr Phe Arg 
355 360 36 5 

Lys Arg Leu Val Leu Ser Val Lys Pro Ala Ala Gin Gly Gin Leu Arg 
370 375 380 

Ala Val Ser Af^n Ala Gin Thr Ala Asp Glu Glu Arg Thr Glu Ser Lys 
385 390 395 400 

Gly Thr 

(54) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
GGCGGATCCA TGGATGTGAC TTCCCAA 
(SB) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54 
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GGCGGATCCC TACACGGCAC TGCTGAA ?7 
(56) INFORMATION FOR GEO ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 112 8 base pairs 

5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

10 ATGGATGTGA CTTCCCAAGC CCGGGGCGTG GGCCTGGAGA TGTACCCAGG CACCGCGCAC 6 0 

GCTGCGGCCC CCAACACCAC CTCCCCCGAG CTCAACCTGT CCCACCCGCT CCTGGGCACG i::0 

GCCCTGGCCA ATGGGACAGG TGAGCTCTCG GAGCACCAGC AGTACGTGAT CGGCCTGTTC 18 0 

CTCTCGTGCC TCTACACCAT CTTCCTCTTC CCCATCGGCT TTGTGGGCAA CATCCTGATC 24 0 

CTGGTGGTGA ACATCAGCTT CCGCGAGAAG ATGACCATCC CCGACCTGTA CTTCATCAAC 3 00 

15 CTGGCGGTGG CGGACCTCAT CCTGGTGGCC GACTCCCTCA TTGAGGTGTT CAACCTGCAC 3C0 

GAGCGGTACT ACGACATCGC CGTCCTGTGC ACCTTCATCT CGCTCTTCCT GCAGGTCAAC 4 20 

ATGTACAGCA GCGTCTTCTT CCTCACCTGG ATGAGCTTCG ACCGCTACAT CGCCCTGGCC 4R0 

AGGGCCATGC GCTGCAGCCT GTTCCGCACC AAGCACCACG CCCGGCTGAG CTGTGGCCTC 54 0 

ATCTGGATGG CATCCGTGTC AGCCACGCTG GTGCCCTTCA CCGCCGTGCA CCTGCAGCAC 6 00 

20 ACCGACGAGG CCTGCTTCTG TTTCGCGGAT GTCCGGGAGG TGCAGTGGCT CGAGGTCACG 66 0 

CTGGGCTTCA TCGTGCCCTT CGCCATCATC GGCCTGTGCT ACTCCCTCAT TGTCCGGGTG 72 0 

CTGGTCAGGG CGCACCGGCA CCGTGGGCTG CGGCCCCGGC GGCAGAAGGC GCTCCGCATG 7 80 

ATCCTCGCGG TGGTGCTGGT CTTCTTCGTC TGCTGGCTGC CGGAGAACGT CTTCATCAGC 8 4 0 

GTGCACCTCC TGCAGCGGAC GCAGCCTGGG GCCGCTCCCT GCAAGCAGTC TTTCCGCCAT 900 

25 GCCCACCCCC TCACGGGCCA CATTGTCAAC CTCGCCGCCT TCTCCAACAG CTGCCTAAAC 96 0 

CCCCTCATCT ACAGCTTTCT CGGGGAGACC TTCAGGGACA AGCTGAGGCT GTACATTGAG 102 0 

CAGAAAACAA ATTTGCCGGC CCTGAACCGC TTCTGTCACG CTGCCCTGAA GGCCGTCATT 108 0 

CCAGACAGCA CCGAGCAGTC GGATGTGAGG TTCAGCAGTG CCGTGTGA 112 8 
(57) INFORMATION FOR SEQ ID NO: 56: 
30 (i) SEQtJENCE CHARACTERISTICS: 
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(A) LEKGTFI: 37b amino acids 

(B) TYPE; amino acid 
(C; STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

Met Asp Val Thr Ser Gin Ala Arg Gly Val Gly Leu Glu Met Tyr Pro 
15 10 15 

Gly Thr Ala His Ala Ala Ala Pro Asn Thr Thr Ser Pro Glu Leu Asn 
20 25 30 

Leu Ser His Pro Leu Leu Gly Thr Ala Leu Ala Asn Gly Thr Gly Glu 
35 40 45 

Leu Ser Glu His Gin Gin Tyr Val Tie Gly Leu Phe Leu Ser Cys Leu 
50 55 60 

Tyr Thr lie Phe Leu Phe Pro lie Gly Phe Val Gly Asn lie Leu lie 
65 70 75 80 

Leu Val Val Asn lie Ser Phe Arg Glu Lys Met Thr lie Pro Asp Leu 
85 90 95 

Tyr Phe lie Asn Leu Ala Val Ala Asp Leu lie Leu Val Ala Asp Ser 
100 105 110 

Leu He Glu Val Phe Asn Leu Hrs Glu Arg Tyr Tyr Asp lie Ala Val 
115 120 125 

Leu Cys Thr Phe Met Ser Leu Phe Leu Gin Val Asn Met Tyr Ser Ser 
130 135 140 

Val Phe Phe Leu Thr Trp Met Ser Phe Asp Arg 'lyx He Ala Leu Ala 
145 150 155 ISO 

Arg Ala Met Arg Cys Ser Leu Phe Arg Thr Lys His His Ala Arg Leu 
165 170 175 

Ser Cys Gly Leu He Trp Met Ala Ser Val Ser Ala Thr Leu Val Pro 
180 185 190 

Phe Thr Ala Val His Leu Gin His Thr Asp Glu Ala Cys Phe Cys Phe 
195 200 205 

Ala Asp Val Arg Glu Val Gin Trp Leu Glu Val Thr Leu Gly Phe He 
210 215 220 

Val Pro Phe Ala He He Gly Leu Cys Tyr Ser Leu He Val Arg Val 
225 230 235 240 



Leu Val Arg Ala His Arg His Arg Gly Leu Arg Pro Arg Arg Gin Lys 
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245 250 255 

Ala Leu Arg Met lie Leu Ala Val Val Leu Val Phe Phe Val Cys Trp 
260 265 270 

Leu Pro Glu Asn Val Phe lie Ser Val His Leu Leu Gin Arg Thr Gin 
275 280 285 

Pro Gly Ala Ala Pro Cys Lys Gin Ser Phe Arg His Ala His Pro Leu 
290 295 300 

Thr Gly His Tie Val Asn Leu Ala Ala Phe Ser Asn Ser Cys Leu A;:;n 
305 310 315 320 

Pro Leu lie Tyr Ser Phe Leu Gly Glu Thr Phe Arg Asp Lys Leu Arg 
325 330 335 

Leu Tyr lie Glu Gin Lye Thr Asn Leu Pro Ala Leu Asn Arg Phe Cys 
34 0 345 3 50 

HiG Ala Ala Leu Lys Ala Val He Pro Asp Ser Thr Glu Gin Ser Asp 
355 360 365 

Val Arg Phe Ser Ser Ala Val 
370 375 

(58) INFOimATION FOR SEQ ID NO: 57: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH; 31 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57; 



AAGGAATTCA CGGCCGGGTG ATGCCATTCC C 
(59) INFORI-IATION FOR SEQ ID NO: 58: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 



GGTGGATCCA TA7VACACGGG CGTTGAGGAC 
(6 0) INFORT^ATION FOR SEQ ID NO: 59: 
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(i) SEQUENCE CHARACTERISTICS: 
(A; LENGTH: 96 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : nmgle 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 

ATGCCATTCC CAAACTGCTC AGCCCCCAGC ACTGTGGTGG CCACAGCTGT GGGTGTCTTG 6 0 

CTGGGGCTGG AGTGTGGGCT GGGTCTGCTG GGCAACGCGG TGGCGCTGTG GACCTTCCTG 12 0 

TTCCGGGTCA GGGTGTGGAA GCCGTACGCT GTCTACCTGC TCAACCTGGC CCTGGCTGAC 160 

CTGCTGTTGG CTGCGTGCCT GCCTTTCCTG GCCGCCTTCT ACCTGAGCCT CCAGGCTTGG 24 0 

CATCTGGGCC GTGTGGGCTG CTGGGCCCTG CGCTTCCTGC TGGACCTCAG CCGCAGCGTG 3 00 

GGGATGGCCT TCCTGGCCGC CGTGGCTTTG GACCGGTACC TCCGTGTGGT CCACCCTCGG 36 0 

CTTAAGGTCA ACCTGCTGTC TCCTCAGGCG GCCCTGGGGG TCTCGGGCCT CGTCTGGCTC 420 

CTGATGGTCG CCCTCACCTG CCCGGGCTTG CTCATCTCTG AGGCCGCCCA GAACTCCACC 480 

AGGTGCCACA GTTTCTACTC CAGGGCAGAC GGCTCCTTCA GCATCATCTG GCAGGAAGCA 54 0 

CTCTCCTGCC TTCAGTTTGT CCTCCCCTTT GGCCTC7vTCG TGTTCTGCAA TGCAGGCATC 60 0 

ATCAGGGCTC TCCAGAAAAG ACTCCGGGAG CCTGAGAAAC AGCCCAAGCT TCAGCGGGCC 66 0 

CAGGCACTGG TCACCTTGGT GGTGGTGCTG TTTGCTCTGT GCTTTCTGCC CTGCTTCCTG 72 0 

GCCAGAGTCC TGATGCACAT CTTCCAGAAT CTGGGGAGCT GCAGGGCCCT TTGTGCAGTG 78 0 

GCTCATACCT CGGATGTCAC GGGCAGCCTC ACCTACCTGC ACAGTGTCGT CAACCCCGTG 84 0 

GTATACTGCT TCTCCAGCCC CACCTTCAGG AGCTCCTATC GGAGGGTCTT CCACACCCTC 900 

CGAGGCAAAG GGCAGGCAGC AGAGCCCCCA GATTTCAACC CCAGAGACTC CTATTCCTGA 960 
(61) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 319 amino acids 

(B) TYPE: amino acid 
{C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
Met Pro Phe Pro Asn CyE Ser Ala Pro Ser Thr Val Val Ala Thr Ala 
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15 10 IS 

Val Gly Val Leu Leu Gly Leu Glu Cys Gly Leu Gly Leu Leu Gly Asia 
20 25 30 

Ala Val Ala Leu Trp Thr Phe Leu Phe Arg Val Arg Val Trp Lys Pro 
35 40 45 

Tyr Ala Val Tyr Leu Leu Asn Leu Ala Leu Ala Asp Leu Leu Leu Ala 
50 55 60 

Ala Cys Leu Pro Phe Leu Ala Ala Phe Tyr Leu Ser Leu Gin Ala Trp 
6 5 70 7 5 8 0 

His Leu G3y Arg Val Gly Cys Trp Ala Leu Arg Phe Leu Leu Asp Leu 
85 90 95 

Ser Arg Ser Val Gly Met Ala Phe Leu Ala Ala Val Ala Leu Asp Arg 
100 105 110 

Tyr Leu Arg Val Val His Pro Arg Leu Lys Val Asn Leu Leu Ser Pro 
115 120 125 

Gin Ala Ala Leu Gly Val Ser Gly Leu VaJ Trp Leu Leu Met Val Ala 
130 135 140 

Leu Thr Cys Pro Gly Leu Leu lie Ser clu Ala Ala Gin Asn Ser Thr 
145 150 155 160 

Arg Cys His Ser Phe Tyr Ser Arg Ala Asp Gly Ser Phe Ser Tie lie 
165 170 175 

Trp Gin Glu Ala Leu Ser Cys Leu Gin Phe Val Leu Pro Phe Gly Leu 
IBO 185 190 

lie Val Phe Cys Asn Ala Gly lie lie Arg Ala Lou Gin Lys Arg Leu 
195 200 205 

Arg Glu Pro Glu Lys Gin Pro Lys Leu Gin Arg Ala Gin Ala Leu Val 
210 215 220 

Thr Leu Val Val Val Leu Phe Ala Leu Cys Phe Leu Pro Cys Phe Leu 
225 230 235 240 

Ala Arg Val Leu Met His lie Phe Gin Asn Leu Gly Ser Cys Arg Ala 

245 250 255 

Lgu Cys Ala Val Ala His Thr Ser Asp Val Thr Gly Ser Leu Thr Tyr 
260 265 270 

Leu His Ser Val Val Asn Pro Val Val Tyr Cys Phe Ser Ser Pro Thr 
275 280 285 

Phe Arg Ser Ser Tyr Arg Arg Val Phe His Thr Leu Arg Gly Lys Gly 
290 295 300 
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Gin Ala Ala Glu Pro Pro Asp Phe Asn Pro Arg Ar.p Ser Tyr Ser 
305 310 315 

(62) INFORMATION FOR SEQ ID NO : 6 1 : 

(1) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 114 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 
10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

ATGGAGGAAG GTGGTGATTT TGACAACTAC TATGGGGCAG ACAACCAGTC TGAGTGTGAG 6 0 

TACACAGACT GGAAATCCTC GGGGGCCCTC ATCCCTGCCA TCTACATGTT GGTCTTCCTC 12 0 

CTGGGCACCA CGGGAAACGG TCTGGTGCTC TGGACCGTGT TTCGGAGCAG CCGGGAGAAG IRO 

AGGCGCTCAG CTGATATCTT CATTGCTAGC CTGGCGGTGG CTGACCTGAC CTTCGTGGTG 24 0 

1^ ACGCTGCCCC TGTGGGCTAC CTACACGTAG CGGGACTATG ACTGGCCCTT TGGGACCTTC 300 

TTCTGCAAGC TCAGCAGCTA CCTCATCTTC GTCAACATGT ACGCCAGCGT CTTCTGCCTC 36 0 

ACCGGCCTCA GCTTCGACCG CTACCTGGCC ATCGTGAGGC CAGTGGCCAA TGCTCGGCTG 42 0 

AGGCTGCGGG TCAGCGGGGC CGTGGCCACG GCAGTTCTTT GGGTGCTGGC CGCCCTCCTG 4 80 

GCCATGCCTG TCATGGTGTT ACGCACCACC GGGGACTTGG AGAACACCAC TAAGGTGCAG 54 0 

20 TGCTACATGG ACTACTCCAT GGTGGCCACT GTGAGCTCAG AGTGGGCCTG GGAGGTGGGC 6 00 

CTTGGGGTCT CGTCCACCAC CGTGGGCTTT GTGGTGCCCT TCACCATCAT GCTGACCTGT 6G0 

TACTTCTTCA TCGCCCAAAC CATCGCTGGC CACTTCCGCA AGGAACGCAT CGAGGGCCTG 72 0 

CGGAAGCGGC GCCGGCTGCT CAGCATCATC GTGGTGCTGG TGGTGACCTT TGCCCTGTGC 78 0 

TGGATGCCCT ACCACCTGGT GAAGACGCTG TACATGCTGG GCAGCCTGCT GCACTGGCCC 84 0 

25 TGTGACTTTG ACCTCTTCCT CATGAACATC TTCCCCTACT GCACCTGCAT CAGCTACGTC 9 00 

AACAGCTGCC TCAACCCCTT CCTCTATGCC TTTTTCGACC CCCGCTTCCG CCAGGCCTGC 96 0 

ACCTCCATGC TCTGCTGTGG CCAGAGCAGG TGCGCAGGCA CCTCCCACAG CAGCAGTGGG 102 0 

GAGAAGTCAG CCAGCTACTC TTCGGGGCAC AGCCAGGGGC CCGGCCCCAA CATGGGCAAG 108 0 

GGTGGAGAAC AGATGCACGA GAAATCCATC CCCTACAGCC AGGAGACCCT TGTGGTTGAC 114 0 

30 TAG 11^5 
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(63) INFORMATION FOR SKQ TD NO: 62; 

(i) SEQUENCE CIIARACTERISTICS : 

(A) LENGTH: 3 80 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 2 : 

Met Glu Glu Gly Gly Asp Phe Asp Asn Tyr Tyr Gly Ala Asp Asn Gin 
I 10 15 

Ser Glu Cys Glu Tyr Thr Asp Trp Lys Ser Eer Gly Ala Leu lie Pro 
20 25 30 

Ala lie Tyr Met Leu Val Phe Leu Leu Gly Thr Thr Gly Asn Gly Leu 
35 40 45 

Val Leu Trp Thr Val Phe Arg Ser Ser Arg Glu Lys Arg Arg Ger Ala 
50 55 60 

Asp He Phe Tie Ala Ser Leu Ala Val Ala Asp Leu Thr Phe Val Val 
65 70 75 80 

Thr Leu Pro Leu Trp Ala Thr Tyr Thr Tyr Arg Asp Tyr Asp Trp Pro 
85 90 95 

Phe Gly Thr Phe Phe Cys Lys Leu Ser Ser Tyr Leu He Phe Val Asn 
100 105 110 

Met Tyr Ala Ser Val Phe Cys Leu Thr Gly Leu Ser Phe Asp Arg Tyr 
115 120 125 

Leu Ala He Val Arg Pro Val Ala Asn Ala Arg Leu Arg Leu Arg Val 
130 135 140 

Ser Gly Ala Val Ala Thr Ala Val Leu Trp Val Leu Ala Ala Leu Leu 
145 ISO 155 160 

Ala Met Pro Val Met Val Leu Arg Thr Thr Gly Asp Leu Glu Asn Thr 
165 17C 175 

Thr Lys Val Gin Cys Tyr Met Asp Tyr Ser Met Val Ala Thr Val Ser 
180 185 190 

Ser Glu Trp Ala Trp Glu Val Gly Leu Gly Val Ser Ser Thr Thr Val 
195 200 205 

Gly Phe Val Val Pro Phe Thr He Met Leu Thr Cys Tyr Phe Phe He 
210 215 220 

Ala Gin Thr He Ala Gly His Phe Arg Lys Glu Arg He Glu Gly Leu 
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225 230 235 240 

Arg Lys Arg Arg Arg Leu Leu Ser He He Val Val Leu Val Val Thr 
245 250 255 

Phe Ala Leu Cys Trp Met Pro Tyr Hie Leu Val Lys Thr Leu Tyr Met 
260 265 270 

Leu Gly Ser Leu Leu Hin Trp Pro Cys Asp Phe Asp Leu Phe Leu Met 
275 280 285 

Asn He Phe Pro Tyr Cys Thr Cys He Ser Tyr Val Asn Scr Cys Leu 
290 295 300 

Asn Pro Phe Leu Tyr Ala Phe Phe Asp Pro Arg Phe Arg Gin Ala Cys 
305 310 315 320 

Thr Ser Met Leu Cys Cys Gly Gin Ser Arg Cys Ala Gly Thr Ser His 

325 330 335 

Ser Ser Ser Gly Glu Lys Ser Ala Ser Tyr Sor Ser Gly His Ser Gin 
340 345 350 

Gly Pro Gly Pro Asn Met Gly Lys Gly Gly Glu Gin Met His Glu Lye 
355 360 365 

Ser He Pro Tyr Ser Gin Glu Thr Leu Val Val Asp 
370 375 380 

(64) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic} 

(xr) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
TGAGAATTCT GGTGACTCAC AGCCGGCACA G 31 

(65) INFORMATION FOR SEQ ID NO:64: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 



wo 00/22129 PCT/US99/23938 

51 

GCCGGATCCA AGGAAAAGCA GCAATAAAAG G ^1 
(66) INFORMATION FOR SEQ ID NO : 6 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1119 base pairc 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

ATGT^CTACC CGCTAACGCT GGAAATGGAC CTCGAGAACC TGGAGGACCT GTTCTGGGAA 6 0 

CTGGACAGAT TGGACAACTA TAACGACACC TCCCTGGTGG AAAATCATCT CTGCCCTGCC 12 0 

ACAGAGGGTC CCCTCATGGC CTCCTTCAAG GCCGTGTTCG TGCCCGTGGC CTACAGCCTC 18 0 

ATCTTCCTCC TGGGCGTGAT CGGCAACGTC CTGGTGCTGG TGATCCTGGA GCGGCACCGG 24 0 

CAGACACGCA GTTCCACGGA GACCTTCCTG TTCCACCTGG CCGTGGCCGA CCTCCTGCTG 3tJ0 

GTCTTCATCT TGCCCTTTGC CGTGGCCGAG GGCTCTGTGG GCTGGGTCCT GGGGACCTTC 36 0 

CTCTGCAAAA CTGTGATTGC CCTGCACAAA GTCAACTTCT ACTGCAGCAG CCTGCTCCTG 42 0 

GCCTGCATCG CCGTGGACCG CTACCTGGCC ATTGTCCACG CCGTCCATGC CTACCGCCAC 48 0 

CGCCGCCTCC TCTCCATCCA CATCACCTGT GGGACCATCT GGCTGGTGGG CTTCCTCCTT 54 0 

GCCTTGCCAG AGATTCTCTT CGCCAAAGTC AGCCAAGGCC ATCACAACAA CTCCCTGCCA 6 00 

CGTTGCACCT TCTCCCAAGA GAACCAAGCA GAAACGCATG CCTGGTTCAC CTCCCGATTC 66 0 

CTCTACCATG TGGCGGGATT CCTGCTGCCC ATGCTGGTGA TGGGCTGGTG CTACGTGGGG 72 0 

GTAGTGCACA GGTTGCGCCA GGCCCAGCGG CGCCCTCAGC GGCAGAAGGC AGTCAGGGTG 78 0 

GCCATCCTGG TGACAAGCAT CTTCTTCCTC TGCTGGTCAC CCTACCACAT CGTCATCTTC B4 0 

CTGGACACCC TGGCGAGGCT GAAGGCCGTG GACAATACCT GCAAGCTGAA TGGCTCTCTC 900 

CCCGTGGCCA TCACCATGTG TGAGTTCCTG GGCCTGGCCC ACTGCTGCCT CAACCCCATG SCO 

CTCTACACTT TCGCCGGCGT GAAGTTCCGC AGTGACCTGT CGCGGCTCCT GACCAAGCTG 1010 

GGCTGTACCG GCCCTGCCTC CCTGTGCCAG CTCTTCCCTA GCTGGCGCAG GAGCAGTCTC 108 0 

TCTGAGTCAG AGAATGCCAC CTCTCTCACC ACGTTCTAG 1119 
(67) INFORI^TTON FOR SEQ ID NO: 66: 
(i) SEQUENCE CHARACTERISTICS; 
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(A) LENGTH: 3 72 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 



(li) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 



Met Asn Tyr Pro 
1 

Leu Phe Trp Glu 
20 

Val Glu Ann His 

3B 

Phe Lys Ala Val 

50 

Gly Val lie Gly 
65 

Gin Thr Arg Ser 



Asp Iisu Leu Leu 
100 

Val Gly Trp Val 
115 

His Lys Val Asn 

130 

Val Asp Arg Tyr 
145 

Arg Arg Leu Leu 



Gly Phe Leu Leu 
180 

Gly His His Asn 
195 

Gin Ala Glu Thr 
210 

Ala Gly Phe Leu 
225 

Val Val His Arg 



Leu Thr Len Glu 
5 

Leu Asp Arg Leu 



Leu Cys Pro Ala 
40 

Phe Val Pro Val 
55 

Asn Val Leu Val 

70 

Ser Thr Glu Thr 
85 

Val Phe lie Leu 



Lgu Gly Thr Phe 
120 

Phe Tyr Cys Ser 
135 

Leu Ala He Val 
150 

Ser He His He 
165 

Ala Leu Pro Glu 



Asn Ser Leu Pro 
200 

His Ala Trp Phe 
215 

Leu Pro Met Leu 
230 

Leu Arg Gin Ala 



Met Asp Leu Glu 
10 

Asp Asn Tyr Asn 
25 

Thr Glu Gly Pro 



Ala T>'r So,r Leu 
6C 

Leu Val He Leu 
75 

Phe Leu Phe His 
90 

Pro Phe Ala Val 
105 

Leu Cys Lys Thr 



Ser Leu Leu Leu 
140 

His Ala Val His 
155 

Thr Cys Gly Thr 
170 

He Leu Phe Ala 
185 

Arg Cys Thr Phe 



Thr Ser Arg Phe 
220 

Val Met GJy Trp 
235 

Gin Arg Arg Pro 



Asn Leu Glu Asp 
15 

Asp Thr Ser Leu 
30 

Leu Met Ala Ser 
45 

He Phe Leu Leu 



Glu Arg His Arg 
80 

Leu Ala Val Ala 
95 

Ala Glu Gly Ser 
110 

Val He Ala Leu 

125 

Ala Cys He Ala 



Ala Tyr Arg His 
160 

He Trp Leu Val 
175 

Lys Val Ser Gin 
190 

Ser Gin Glu Asn 

205 

Leu Tyr His Val 



Cys Tyr Val Gly 
240 

Gin Arg Gin Lys 
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245 250 255 

Ala Val Arg Val Ala lie Leu Val Thr Ser lie Phe Phe Leu Cys Trp 
260 265 270 

Ser Pro Tyr HiG lie Val lie Phe Leu Acp Thr Leu Ala Arg Leu Lys 
275 280 285 

Ala Val Asp Asn Thr Cys Lys Leu Asn Gly Ser Leu Pro Val Ala Tie 
290 295 300 

Thr Met Cys Glu Phe Leu Gly Leu Ala His Cys Cys Leu Asn Pro Met 
305 310 315 320 

Leu Tyr Thr Phe Ala Gly Val Lys Phe Arg Ser Asp Leu Ser Arg Leu 
325 330 335 

Leu Thr Lys Leu Gly Cyn Thr Gly Pre Ala Ser Leu Cys Gin Leu Phe 
340 345 350 

Pro Ser Trp Arg Arg Ser Ser Leu Ser Glu Ser Glu Asn Ala Thr Ser 
355 360 365 

Leu Thr Thr Phe 

370 

(68) INFORf^ATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SHQ ID NO: 67: 
CAAAGCTTGA AAGCTGCACG GTGCAGAGAC 30 

(69) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 8 : 
GCGGATCCCG AGTCACACCC TGGCTGGGCC 3 0 

(70) INFORMATION FOR SEQ ID NO: 69: 
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( 1 ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 9 : 

ATGGATGTGA CTTCCCAAGC CCGGGGCGTG GGCCTGGAGA TGTACCCAGG CACCGCGCAG 6 0 

CCTGCGGCCC CCAACACCAC CTCCCCCGAG CTCAACCTGT CCCACCCGCT CCTGGGCACC 12 0 

GCCCTGGCCA ATGGGACAGG TGAGCTCTCG GAGCACCAGC AGTACGTGAT CGGCCTGTTC 180 

CTCTCGTGCC TCTACACCAT CTTCCTCTTC CCCATCGGCT TTGTGGGCAA CATCCTGATC 24 0 

CTGGTGGTGA ACATCAGCTT CCGCGAGAAG ATGACCATCC CCGACCTGTA CTTCATCAAC 3 00 

CTGGCGGTGG CGGACCTCAT CCTGGTGGCC GACTCCCTCA TTGAGGTGTT CAACCTGCAC 360 

GAGCGGTACT ACGACATCGC CGTCCTGTGC ACCTTCATGT CGCTOTTCCT GCAGGTCAAC 420 

ATGTACAGCA GCGTCTTCTT CCTCACCTGG ATGAGCTTCG ACCGCTACAT CGCCCTGGCC 480 

AGGGCCATGC GCTGCAGCCT GTTCCGCACC AAGCACCACG CCCGGCTGAG CTGTGGCCTC 54 0 

ATCTGGATGG CATCCGTGTC AGCCACGCTG GTGCCCTl^CA CCGCCGTGCA CCTGCAGCAC 600 

ACCGACGAGG CCTGCTTCTG TTTCGCGGAT GTCCGGGAGG TGCAGTGGCT CGAGGTCACG 66 0 

CTGGGCTTCA TCGTGCCCTT CGCCATCATC GGCCTGTGCT ACTCCCTCAT TGTCCGGGTG 72 0 

CTGGTCAGGG CGCACCGGCA CCGTGGGCTG CGGCCCCGGC GGCAGAAGGC GCTCCGCATG 7 8 0 

ATCCTCGCGG TGGTGCTGGT CTTCTTCGTC TGCTGGCTGC CGGAGAACGT CTTCATCAGC 84 0 

GTGCACCTCC TGCAGCGGAC GCAGCCTGGG GCCGCTCCCT GCAAGCAGTC TTTCCGCCAT 90 0 

GCCCACCCCC TCACGGGCCA CATTGTCAAC CTCACCGCCT TCTCCAACAG CTGCCTAAAC 9 60 

CCCCTCATCT ACAGCTTTCT CGGGGAGACC TTCAGGGACA AGCTGAGGCT GTACATTGAG 10 2 0 

CAGAAAACAA ATTTGCCGGC CCTGAACCGC TTCTGTCACG CTGCCCTGAA GGCCGTCATT 1080 

CCAGACAGCA CCGAGCAGTC GGATGTGAGG TTCAGCAGTG CCGTGTAG 112 8 
(71) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 375 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

Met Asp Val Thr Ser Gin Al^ Arg Gly Val Gly Leu Glu Met Tyr Pro 
15 10 15 

Gly Thr Ala Gin Pro Ala Ala Pro Asn Thr Thr Ser Pro Glu Leu Asn 
20 25 30 

Leu Ser His Pro Leu Leu Gly Thr Ala Leu Ala Asn Gly Thr Gly Glu 
35 40 45 

Leu Ser Glu His Gin Gin Tyr Val He Gly Leu Phe Leu Ser Cys Leu 
50 55 60 

Tyr Thr lie Phe Leu Phe Pro He Gly Phe Val Gly Asn He Leu lie 
65 70 75 80 

Leu Val Val Asn He Ser Phe Arg Glu Lys Met Thr He Pro Asp Leu 
85 90 95 

Tyr Phe He Acn Leu Ala Val Ala Asp Leu He Leu Val Ala Asp Ser 
100 105 110 

Leu He Glu Val Phe Asn Leu His Glu Arg I'yr Tyr Asp He Ala Val 
115 120 125 

Leu Cys Thr Phe Met Ser Leu Phe Leu Gin Val Asn Met Tyr Ser Ser 
130 135 140 

Val Phe Phe Leu Thr Trp Met Ser Phe Asp Arc Tyr He Ala Leu Ala 
145 150 155 160 

Arg Ala Met Arg Cys Ser Leu Phe Arg Thr Lys His His Ala Arg Leu 
165 170 175 

Ser Cys Gly Leu He Trp Met Ala Ser Val Ser Ala Thr Leu Val Pro 
180 185 190 

Phe Thr Ala Val His Leu Gin His Thr Asp Glu Ala Cys Phe Cys Phe 
195 200 205 

Ala Asp Val Arg Glu Val Gin Trp Leu Glu Val Thr Leu Gly Phe He 
210 215 220 

Val Pro Phe Ala He He Gly Leu Cys Tyr Ser Leu He Val Arg Val 
225 230 235 240 

Leu Val Arg Ala His Arg His Arg Gly Leu Arg Pro Arg Arg Gin Lys 
245 250 255 



Ala Leu Arg Met He Leu Ala Val Val Leu Val Phe Phe Val Cys Trp 

260 265 270 
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Leu Pro Glu Asn Val Phe lie Ser Val His Leu Leu Gin Arg Thr Gin 
275 280 285 

Pro Gly Ala Ala Pro Cys Lye Gin Ser Phe Arg His Ala His Pro Leu 
29C 295 300 

Thr Gly His lie Val Asn Leu Thr Ala. Phe Ser Asn Ser Cys Leu Asn 
305 310 315 320 

Pro Leu lie Tyr Ser Phe Leu Gly Glu Thr Phe Arg Asp Lys Leu Arg 
325 330 335 

Leu Tyr lie Glu Gin Lys Thr Asn Leu Pro Ala Leu Asn Arg Phe Cys 
340 345 350 

His Ala Ala Leu Lys Ala Val He Pro Ai^p Ser Thr Glu Gin Ser Asp 
355 360 365 

Val Arg Phe Ser Ser Ala Val 
370 375 

(72) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION; SEQ ID N0:71: 
ACAGAATTCC TGTGTGGTTT TACCGCCCAG 

(73) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
CTCGGATCCA GGCAGAAGAG TCGCCTATGG 

(74) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1137 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 



wo 00/22129 



PCT/DS99/23938 



57 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 

ATGGACCTGG GGAAA.CCAAT GA;UU\GCGTG CTGGTGGTGG CTCTCCTTGT CATTTTCCAG 6 0 

GTATGCCTGT GTCAAGATGA GGTCACGGAC GATTACATCG GAGACAACAC CACAGTGGAC 12 0 

TACACTTTGT TCGAGTCTTT GTGCTCCAAG AAGGACGTGC GGAACTTTAA AGCCTGGTTC 180 

CTCCCTATCA TGTACTCCAT CATTTGTTTC GTGGGCCTAC TGGGCAATGG GCTGGTCGTG 24 0 

TTGACCTATA TCTATTTCAA GAGGCTCAAG ACCATGACCG ATACCTACCT GCTCAACCTG 300 

GCGGTGGCAG ACATCCTCTT CCTCCTGACC CTTCCCTTCT GGGCCTACAG CGCGGCCAAG 36 0 

TCCTGGGTCT TCGGTGTCCA CTTTTGCAAG CTCATCTTTG CCATCTACAA GATGAGCTTC 4 20 

ttcagtggca tgctcctact tctttgcatc agcattgacc gctacgtggc catcgtccag 4 bo 

GCTGTCTCAG CTCACCGCCA CCGTGCCCGC GTCCTTCTCA TCAGCAAGCT GTCCTGTGTG 54 0 

GGCATCTGGA TACTAGCCAC AGTGCTCTCC ATCCCAGAGC TCCTGTACAG TGACCTCCAG 6 00 

AG<:3AGCAGCA GTGAGCAAGC GATGCGATGC TCTCTCATCA CAGAGCATGT GGAGGCCTTT 66 0 

ATCACCATCC AGGTGGCCCA GATGGTGATC GGCTTTCTGG TCCCCCTGCT GGCCATGAGC 72 0 

TTCTGTTACC TTGTCATCAT CCGCACCCTG CTCCAGGCAC GCAACTTTGA GCGCAACAAG 7 80 

GCCATCAAGG TGATCATCGC TGTGGTCGTG GTCTTCATAG TCTTCCAGCT GCCGTACAAT 84 0 

GGGGTGGTCC TGGCCCAGAC GGTGGCCAAC TTCAACATCA CCAGTAGCAC CTGTGAGCTC 9 00 

AGTAAGCAAC TCAACATCGC CTACGACGTC ACCTACAGCC TGGC CTGCGT CCGCTGCTGC 96 0 

GTCAACCCTT TCTTGTACGC CTTCATCGGC GTCAAGTTCC GCAACGATCT CTTCAAGCTC 102 0 

TTCAAGGACC TGGGCTGCCT CAGCCAGGAG CAGCTCCGGC AGTGGTCTTC CTGTCGGCAC 108 0 

ATCCGGCGCT CCTCCATGAG TGTGGAGGCC GAGACCACCA CCACCTTCTC CCCATAG 1137 
(75) INFORMATION FOR SEQ ID KO:74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 373 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECUI.E TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
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Met Asp Leu Gly Lys Pro Met Lys Ser Val Leu Val Val Ala Leu Leu 
15 10 15 

Val lie Phe Gin Val Cys Leu Cys Cln Asp Glu Val Thr Asp Asp Tyr 
20 25 30 

lie Gly Asp Asn Thr Thr Val Asp Tyr Thr Leu Phe Glu Ser Leu Cys 
35 40 45 

Ser Ly3 Lys Asp Val Arg Asn Phe Lys Ala Trp Phe Leu Pro He Met 
50 55 60 

Tyr Ser He He Cys Phe Val Gly Leu Leu Gly Asn Gly Leu Val Val 
65 70 75 80 

Leu Thr Tyr He Tyr Phe Lys Arg Leu Lys Thr Met Thr Asp Thr Tyr 
85 90 95 

Leu Leu Asn Leu Ala Val Ala Asp He Leu Phe Leu Leu Thr Leu Pro 
100 105 110 

Phe Trp Ala Tyr Ser Ala Ala Lys Sor Trp Val Phe Gly Val His Phe 
115 120 125 

Cys Lys Leu He Phe Ala He Tyr Lys Met Ser Phe Phe Ser G]y Met 
130 135 140 

Leu Leu Leu Leu Cys He Ser He Asp Arg Tyr Val Ala He Val Gin 
1^5 150 155 160 

Ala Val Ser Ala His Arg His Arg Ala Arg Val Leu Leu He Ser Lys 
165 170 175 

Leu Ser Cys Val Gly He Trp He Leu Ala Thr Vai Leu Ser He Pro 
180 185 190 

Glu Leu Leu Tyr Ser Asp Leu Gin Arg Ser Ser Ser Glu Gin Ala Met 
195 200 205 

Arg Cys Ser Leu He Thr Glu His Val Glu Ala Phe He Thr He Gin 
210 215 220 

Val Ala Gin Met Val He Gly Phe Leu Val Pro Leu Leu Ala Met Ser 
225 230 235 240 

Phe Cys Tyr Leu Val He He TUrg Thr Leu Leu Gin Ala Arg Asn Phe 
245 250 255 

Glu Arg Asn Lys Ala He Lys Val He He Ala Val Val Val Val Phe 
260 265 270 



He Val Phe Gin Leu Pro Tyr Asn Gly Val Val Leu Ala Gin Thr Val 
275 280 285 
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Ala Asn Phe Asn He Thr Scr Ser Thr Cys Glu Leu Ser LyG Gin I.eu 
290 295 300 

Asn lie Ala Tyr Asp Val Thr Tyr Ser Leu Ala Cys Val Arg Cys Cys 
305 310 315 320 

Val Asn Pro Phe Leu Tyr Ala Phe lie Gly Val Lys Phe Arg Asn Asp 
325 330 335 

Leu Phe Lys Leu Phe Lys Asp Leu Gly Cys Leu Ser Gin Glu Gin Leu 
340 345 350 

Arg Gin Trp Ser Ser Cys Arg His lie Arg Arg Ser Ser Met Ser Val 
355 360 365 

Glu Ala Glu Thr Thr Thr Thr Phe Ser Pro 
370 375 

(76) INFORMATION FOR SEQ TD NO: 75: 

{l} SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 75: 
CTGGAATTCA CCTGGACCAC CACCAATGGA TA 32 

(77) INFORMATION FOR SEQ ID NO:76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
CTCGGATCCT GCAAAGTTTG TCATACAGTT 3 0 

(78) INFORMATION FOR SEQ ID NO : 7 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1085 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE T^'PE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 

ATGGATATAC AAATGGCAAA CAATTTTACT CCGCCCTCTG CAACTCCTCA GGGAAATGAC 6 0 

TGTGACCTCT ATGCACATCA CAGCACGGCC AGGATAGTAA TGCCTCTGCA TTACAGCCTC 12 0 

GTCTTCATCA TTGGGCTCGT GGGAAACTTA CTAGCCTTGG TCGTCATTGT TCAAAACAGG 18 0 

AAAAAAATCA ACTCTACCAC CCTCTATTCA ACAAATTTGG TGATTTCTGA TATACTTTTT 24 0 

ACCACGGCTT TGCCTACACG AATAGCCTAC TATGCAATGG GCTTTGACTG GAGAATCGGA 3 00 

GATGCCTTGT GTAGGATAAC TGCGCTAGTG TTTTACATCA ACACATATGC AGGTGTGAAC 36 0 

TTTATGACCT GCCTGAGTAT TGACCGCTTC ATTGCTGTGG TGCACCCTCT ACGCTACAAC 42 0 

AAGATAAAAA GGATTGAACA TGCAAAAGGC GTGTGCATAT TTGTCTGGAT TCTAGTATTT 4 80 

GCTCAGACAC TCCCACTCCT CATCAACCCT ATGTCAAAGC AGGAGGCTGA AAGGATTACA 54 0 

TGCATGGAGT ATCCAAACTT TGAAGAAACT AAATCTCTTC CCTGGATTCT GCTTGGGGCA 6 00 

TGTTTCATAG GATATGTACT TCCACTTATA ATCATTCTCA TCTGCTATTC TCAGATCTGC 66 0 

TGCAAACTCT TCAGAACTGC CAAACAAAAC CCACTCACTG AGAAATCTGG TGTAAACAAA 72 0 

AAGGCTCTCA ACACAATTAT TCTTATTATT GTTGTGTTTG TTCTCTGTTT CACACCTTAC 780 

CATGTTGCAA TTATTCAACA TATGATTAAG AAGCTTCGTT TCTCTAATTT CCTGGAATGT 84 0 

AGCCAAAGAC ATTCGTTCCA GATTTCTCTG CACTTTACAG TATGCCTGAT GAACTTCAAT 900 

TGCTGCATGG ACCCTTTTAT CTACTTCTTT GCATGTAAAG GGTATAAGAG AAAGGTTATG 96 0 

AGGATGCTGA AACGGCAAGT CAGTGTATCG ATTTCTAGTG CTGTGAAGTC AGCCCCTGAA 1020 

GAAAATTCAC GTGAAATGAC AGAAACGCAG ATGATGATAC ATTCCAAGTC TTCAAATGGA 108 0 

AAGTGA 1086 
(7 9) INPORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 361 amino acids 

(B) TYPE: amino acid 

(C) STIU^NDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 

Met Asp lie Gin Met Ala Asn Asn Phe Thr Pro Pro Ser Ala Thr Pro 
15 10 IB 
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Gin Gly Asn Asp Cys Asp Leu Tyr Ala His His Ser Thr Ala Arg He 

20 25 30 

Val Met Pro Leu His Tyr Ser Leu Val Phe Tie He Gly Leu Val Gly 
35 40 45 

Asn Leu Leu Ala Leu Val Val He Val Gin Asn Arg Lys Lys He Asn 
50 55 60 

Ser Thr Thr Leu Tyr Ser Thr Asn Leu Val He Ser Asp He Leu Phe 
65 70 75 80 

Thr Thr Ala Leu Pro Thr Arg He Ala Tyr Tyr Ala Met Gly Phe Asp 
85 90 95 

Trp Arg He Gly Acp Ala Leu Cys Arg He Thr Ala Leu Val Phe Tyr 
100 105 110 

He Asn Thr Tyr Ala Gly Val Asn Phe Met Thr CyG Leu Ser He Asp 
115 120 125 

Arg Phe He Ala Val Val His Pro Leu Arg Tyr Acn Lys He Lys Arg 
130 135 140 

He Glu His Ala Lys Gly Val Cys He Phe Val Trp He Leu Val Phe 
145 150 155 160 

Ala Gin Thr Leu Pro Leu Leu He Asn Pro Met Ser Lys Gin Glu Ala 
165 170 175 

Glu Arg He Thr Cys Met Glu Tyr Pro Asn Phe Glu Glu Thr Lys Ser 
180 185 190 

Leu Pro Trp lie Leu Leu Gly Ala Cys Phe He Gly Tyr Val Leu Pro 
195 200 205 

Leu He He He Leu He Cys Tyr Ser Gin He Cys Cys Lys Leu Phe 
210 215 220 

Arg Thr Ala Lys Gin Asn Pro Leu Thr Glu Lys Ser Gly Val Asn Lys 
225 230 235 240 

Lys Ala Leu Asn Thr He He Leu He He Val Val Phe Val Leu Cys 
245 250 255 

Phe Thr Pro Tyr His Val Ala He He Gin His Met He Lys Lys Leu 
260 265 270 

Arg Phe Ser Asn Phe Leu Glu Cys Ser Gin Arg His Ser Phe Gin He 
275 280 285 

Ser Leu His Phe Thr Val Cys Leu Met Asn Phe Asn Cys Cys Met Asp 
290 295 300 



Pro Phe He Tyr Phe Phe Ala Cys Lys Gly Tyr Lys Arg Lys Val Met 
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310 



315 



320 



Arg Met Leu 



Lys Arg Gin Val Ser Val Ser lie Ser Ser Ala Val Lys 
325 330 335 



Ser Ala Pro 



Glu Glu Asn Ser Arg Glu Met Thr Glu Thr Gin Met Met 
340 345 350 



lie His Ser 
355 



Lys Ser Ser Asn Gly Lys 

360 



(8 0) INFORJ4ATION FOR SEQ ID NO: 79: 

{ i ) SEQUENCE CHARACTERISTICS r 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
{xi} SEQUENCE DESCRIPTION: SEQ ID NO; 79: 
CTGGAATTCT CCTGCTCATC CAGCCATGCG G 31 

(81) INFORMATION FOR SEQ ID NO : B 0 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 
CCTGGATCCC CACCCCTACT GGGGCCTCAG 3O 

(82) INFORMATION FOR SEQ ID NO : 8 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1446 base paira 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s ing 1 e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 
ATGCGGTGGC TGTGGCCCCT GGCTGTCTCT CTTGCTGTGA TTTTGGCTGT GGGGCTAAGC 6 0 
AGGGTCTCTG GGGGTGCCCC CCTGCACCTG GGCAGGCACA GAGCCGAGAC CCAGGAGCAG 12 0 
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CAGAGCCGAT CCAAGAGGGG CACCGAGGAT GAGGAGGCCA AGGGCGTGCA GCAGTATGTG 18 0 

CCTGAGGAGT GGGCGGAGTA CCCCCGGCCC ATTCACCCTG CTGGCCTGCA GCCAACCAAG 24 0 

CCCTTGGTGG CCACCAGCCC TAACCCCGAC AAGGATGGGG GCACCCCAGA CAGTGGGCAG 3 00 

GAACTGAGGG GCAATCTGAC AGGGGCACCA GGGCAGAGGC TACAGATCCA GAACCCCCTG 36 0 

TATCCGGTGA CCGAGAGCTC CTACAGTGCC TATGCCATCA TGCTTCTGGC GCTGGTGGTG 42 0 

TTTGCGGTGG GCATTGTGGG CAACCTGTCG GTCATGTGCA TCGTGTGGCA CAGCTACTAC 48 0 

CTGAAGAGCG CCTGGAACTC CATCCTTGCC AGCCTGGCCC TCTGGGATTT TCTGGTCCTC 54 0 

TTTTTCTGCC TCCCTATTGT CATCTTCAAC GAGATCACCA AGCAGAGGCT ACTGGGTGAC 6 00 

GTTTCTTGTC GTGCCGTGCC CTTCATGGAG GTCTCCTCTC TGGGAGTCAC GACTTTCAGC 660 

CTCTGTGCCC TGGGCATTGA CCGCTTCCAC GTGGCCACCA GCACCCTGCC CAAGGTGAGG 720 

CCCATCGAGC GGTGCCAATC CATCCTGGCC AAGTTGGCTG TCATCTGGGT GGGCTCCATG 78 0 

ACGCTGGCTG TGCCTGAGCT CCTGCTGTGG CAGCTGGCAC AGGAGCCTGC CCCCACCATG 84 0 

GGCACCCTGG ACTCATGCAT CATGAJ^CCC TCAGCCAGCC TGCCCGAGTC CCTGTATTCA 9 00 

CTGGTGATGA CCTACCAGAA CGCCCGCATG TGGTGG7ACT TTGGCTGCTA CTTCTGCCTG 96 0 

CCCATCCTCT TCACAGTCAC CTGCCAGCTG GTGACATGGC GGGTGCGAGG CCCTCCAGGG 102 0 

AGGAAGTCAG AGTGCAGGGC CAGCAAGCAC GAGCAGTGTG AGAGCCAGCT CAACAGCACC 108 0 

GTGGTGGGCC TGACCGTGGT CTACGCCTTC TGCACCCTCC CAGAGAACGT CTGCAACATG 114 0 

GTGGTGGCCT ACCTCTCCAC CGAGCTGACC CGCCAGACCC TGGACCTCCT GGGCCTCATC 12 00 

AACCAGTTCT CCACCTTCTT CAAGGGCGCC ATCACCCCAG TGCTGCTCCT TTGCATCTGC 126 0 

AGGCCGCTGG GCCAGGCCTT CCTGGACTGC TGCTGCTGCT GCTGCTGTGA GGAGTGCGGC 132 0 

GGGGCTTCGG AGGCCTCTGC TGCCAATGGG TCGGACAACA AGCTCAAGAC CGAGGTGTCC 138 0 

TCTTCCATCT ACTTCCACAA GCCCAGGGAG TCACCCCCAC TCCTGCCCCT GGGCACACCT 144 0 

TGCTGA 144 6 
(83) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 81 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 

Met Arg Trp Leu Trp Pro Leu Ala Val Ser Leu Ala Val lie Leu Ala 
15 10 15 

Val Gly Leu Ser Arg Val Ser Gly Gly Ala Pro Leu His Leu Gly Arg 
20 25 30 

His Arg Ala Glu Thr Gin Glu Gin Gin Ser Arg Ser Lys Arg Gly Thr 
35 40 45 

Glu Asp Glu Glu Ala Lys Gly Val Gin Gin Tyr Val Pro Glu Glu Trp 
50 55 60 

Ala Glu Tyr Pro Arg Pro He His Pro Ala Gly Leu Gin Pro Thr Lys 
65 70 75 60 

Pro Leu Val Ala Thr Ser Pro Asn Pro Asp Lys Asp Gly Gly Thr Pro 
85 90 95 

Asp Ser Gly Gin Glu Leu Arg Gly Asn Leu Thr Gly Ala Pro Gly Gin 
100 105 110 

Arg Leu Gin lie Gin Asn Pro Leu Tyr Pro Val Thr Glu Ser Ser Tyr 
115 120 125 

Ser Ala Tyr Ala He Met Leu Leu Ala Leu Val Val Phe Ala Val Gly 
130 135 140 

lie Val Gly Asn Leu Ser Val Met Cys He Val Trp His Ser Tyr Tyr 
145 150 155 160 

Leu Lys Ser Ala Trp Asn Ser He Leu Ala Ser Leu Ala Leu Trp Asp 
165 170 175 

Phe Leu Val Leu Phe Phe Cys Leu Pro He Val He Phe Asn Glu He 
180 les 190 

Thr Lys Gin Arg Leu Leu Gly Asp Val Ser Cys Arg Ala Val Pro Phe 
195 200 205 

Met Glu Val Ser Ser Leu Gly Val Thr Thr Phe Ser Leu Cys Ala Leu 
210 215 220 

Gly He Asp Arg Phe His Val Ala Thr Ser Thr Leu Pro Lys Val Arg 
225 230 235 240 

Pro He Glu Arg Cys Gin Ser He Leu Ala Lys Leu Ala Val lie Trp 
245 25C 255 

Val Gly Ser Met Thr Leu Ala Val Pro Glu Leu Leu Leu Trp Gin Leu 
260 265 270 



Ala Gin Glu Pro Ala Pro Thr Met Gly Thr Leu Asp Ser Cys lie Met 
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275 280 285 

Lys Pro Ser Ala Ser Leu Pro Glu Ser Leu Tyr Ser Leu Val Met Thr 
290 295 300 

Tyr Gin Aan Ala Arg Met Trp Trp Tyr Phe Gly Cys Tyr Phe Cys Leu 
305 310 315 320 

Pro lie Leu Phe Thr Val Thr Cys Gin Leu Val Thr Trp Arg Val Arg 
325 330 335 

Gly Pro Pro Gly Arg Lys Ser Glu Cys Arg Ala Ser Lys Flis Glu Gin 
340 345 350 

Cys Glu Ser Gin Leu Asn Ser Thr Val Val Gly Leu Thr Val Val Tyr 
355 360 365 

Ala Phe Cys Thr Leu Pro Glu Asn Val Cys Asn lie Val Val Ala Tyr 
370 375 380 

Leu Ser Thr Glu Leu Thr Arg Gin Thr Leu Asp Leu Leu Gly Leu lie 
385 390 395 400 

Asn Gin Phe Ser Thr Phe Phe Lys Gly Ala lie Thr Pro Val Leu Leu 
405 410 415 

Leu Cys lie Cys Arg Pro Leu Gly Gin Ala Phe Leu Asp Cys Cys Cys 
420 425 430 

Cys Cys Cys Cys Glu Glu Cys Gly Gly Ala Ser Glu Ala Ser Ala Ala 
435 440 445 

Asn Gly Ser Asp Asn Lys Leu Lys Thr Glu Val Ser Ser Ser lie Tyr 
450 455 460 

Phe His Lys Pro Arg Glu Ser Pro Pro Leu Leu Pro Leu Gly Thr Pro 
465 470 475 480 

Cys 



(84) INFORMATION FOR SEQ ID WO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 



ATGTGGAACG CGACGCCCAG CO 



2 
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(85) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 
TCATGTATTA ATACTAGATT CT 

(86) INFORT^ATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8b: 
TACCATGTGG AACGCGACGC CCAGCGAAGA GCCGGGGT 

(87) INFORMATION FOR SEQ ID NO : 86 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 9 base pairs 

(E) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
CGGAATTCAT GTATTAATAC TAGATTCTGT CCAGGCCCG 

(88) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1101 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 
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ATGTGGAACG CGACGCCCAG CGAAGAGCCG GGGTTCAACC TCACACTGGC CGACCTGGAC 6 0 

TGGGATGCTT CCCCCGGCAA CGACTCGCTG GGCGACGAGC TGCTGCAGCT CTTCCCCGCG 120 

CCGCTGCTGG CGGGCGTCAC AGCCACCTGC GTGGCACTCT TCGTGGTGGG TATCGCTGGC 18 0 

AACCTGCTCA CCATGCTGGT GGTGTCGCGC TTCCGCGAGC TGCGCACCAC CACCAACCTC 24 0 

TACCTGTCCA GCATGGCCTT CTCCGATCTG CTCATCTTCC TCTGCATGCC CCTGGACCTC 3 00 

GTTCGCCTCT GGCAGTACCG GCCCTGGAAC TTCGGCGACC TCCTCTGCAA ACTCTTCCAA 3 60 

TTCGTCAGTG AGAGCTGCAC CTACGCCACG GTGCTCACCA TCACAGCGCT GAGCGTCGAG 42 0 

CGCTACTTCG CCATCTGCTT CCCACTCCGG GCCAAGGTGG TGGTCACCAA GGGGCGGGTG 4 80 

AAGCTGGTCA TCTTCGTCAT CTGGGCCGTG GCCTTCTGCA GCGCCGGGCC CATCTTCGTG 54 0 

CTAGTCGGGG TGGAGCACGA GA7iCGGCACC GACCCTTGGG ACACCAACGA GTGCCGCCCC 60 0 

ACCGAGTTTG CGGTGCGCTC TGGACTGCTC ACGGTCATGG TGTGGGTGTG CAGCATCTTC 6G0 

TTCTTCCTTC CTGTCTTCTG TCTCACGGTC CTCTACAGTC TGATCGGCAG GAAGCTGTGG 72 0 

CGGAGGAGGC GCGGCGATGC TGTCGTGGGT GCCTCGCTCA GGGACCAGAA CCACAAGCAA 7 80 

ACCGTGAAAA TGCTGGCTGT AGTGGTGTTT GCCTTCATCC TCTGCTGGCT CCCCTTCCAC 84 0 

GTAGGGCGAT ATTTATTTTC CAAATCCTTT GAGCCTGGCT CCTTGGAGAT TGCTCAGATC 90 0 

AGCCAGTACT GCAACCTCGT GTCCTTTGTC CTCTTCTACC TCAGTGCTGC CATCAACCCC 96 0 

ATTCTGTACA ACATCATGTC CAAGAAGTAC CGGGTGGCAG TGTTCAGACT TCTGGGATTC 102 0 

GAACCCTTCT CCCAGAGAAA GCTCTCCACT CTGAAAGATG AAAGTTCTCG GGCCTGGACA 108 0 

GAATCTAGTA TTAATACATG A 13.01 
(8 9) INFORMATION FOR SEQ ID NO: 88: 

{i} SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 66 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

Met Trp Asn Ala Thr Pro Ser Glu Glu Pro Gly Phe Asn Leu Thr Leu 
15 10 15 

Ala Asp Leu Asp Trp Asp Ala Ser Pro Gly Asn Asp Ser Leu Gly Asp 
20 25 30 
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Glu Leu Leu Gin Leu Phe Pro Ala Pro Leu Leu Ala Gly Val Thr Ala 
35 40 45 

Thr Cys Val Ala Leu Phe Val Val Gly lie Ala Gly Asn Leu Leu Thr 
50 55 60 

Met Leu Val Val Ser Arg Phe Arg Glu Leu Arg Thr Thr Thr Asn Leu 
65 70 75 80 

Tyr Leu Ser Ser Met Ala Phe Ser Asp Leu Leu Tie Phe Leu Cys Met 
85 90 95 

Pro Leu Asp Leu Val Arg Leu Trp Gin Tyr Arg Pro Trp Asn Phe Gly 
100 105 110 

Asp Leu Leu Cys Lys Leu Phe Gin Phe Val Ser Glu Ser Cys Thr Tyr 
115 120 125 

Ala Thr Val Leu Thr He Thr Ala Leu Ser Val Glu Arg Tyr Phe Ala 
130 135 140 

lie Cys Phe Pro Leu Arg Ala Lys Val Val Val Thr Lys Gly Arg Val 
145 150 155 160 

Lys Leu Val He Phe Val He Trp Ala Val Ala Phe Cys Ser Ala Gly 
165 170 175 

Pro He Phe Val Leu Val Gly Val Glu His Glu Asn Gly Thr Asp Pro 
180 185 190 

Trp Asp Thr Asn Glu Cys Arg Pro Thr Glu Phe Ala Val Arg Ser Gly 
195 200 205 

Leu Leu Thr Val Met Val Trp Val Ser Ser He Phe Phe Phe Leu Pro 
210 215 220 

Val Phe Cys Leu Thr Val Leu Tyr Ser Leu He Gly Arg Lys Leu Trp 
225 230 235 240 

Arg Arg Arg Arg Gly Asp Ala Val Val Gly Ala Ser Leu Arg Asp Gin 
245 250 255 

Asn His Lys Gin Thr Val Lys Met Leu Ala Val Val Val Phe Ala Phe 
260 265 270 

He Leu Cys Trp Leu Pro Phe His Val Gly Arg Tyr Leu Phe Ser Lys 
275 280 285 

Ser Phe Glu Pro Gly Ser Leu Glu He Ala Gin He Ser Gin Tyr Cys 
290 295 300 

Asn Leu Val Ser Phe Val Leu Phe Tyr Leu Ser Ala Ala He Asn Pro 
305 310 315 320 



He Leu Tyr Asn He Met Ser Lys Lys Tyr Arg Val Ala Val Phe Arg 
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325 



330 



Leu Leu Gly Phe Glu Pro Phe Ser 
340 



Gin Arg Lyn Leu Ser Thr Leu Lys 
345 350 



Asp Glu Ser Ser Arg Ala Trp Thr 

355 360 



Glu Ser Ser lie Asn Thr 
365 



(90) INFORMATION FOR SEQ ID NO: 89; 

(i) SEQUENCE CliARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 
GCAAGCTTGT GCCCTCACCA AGCCATGCGA GCC 33 

(91) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 
CGGAATTCAG CAATGAGTTC CGACAGAAGC 30 

(92) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1842 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 

ATGCGAGCCC CGGGCGCGCT TCTCGCCCGC ATGTCGCGGC TACTGCTTCT GCTACTGCTC 6 0 

AAGGTGTCTG CCTCTTCTGC CCTCGG<3GTC GCCCCTGCGT CCAGAAACGA AACTTGTCTG 12 0 

GGGGAGAGCT GTGCACCTAC AGTGATCCAG CGCCGCGGCA GGGACGCCTG GGGACCGGGA 180 



AATTCTGCAA GAGACGTTCT GCGAGCCCGA GCACCCAGGG AGGAGCAGGG GGCAGCGTTT 24 0 
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CTTGCGGGAC CCTCCTGGGA CCTGCCGGCG GCCCCGGGCC GTGACCCGGC TGCAGGCAGA 3 00 

GGGGCGGAGG CGTCGGCAGC CGGACCCCCG GGACCTCCAA CCAGGCCACC TGGCCCCTGG 36 0 

AGGTGGAAAG GTGCTCGGGG TCAGGAGCCT TCTGAAACTT TGGGGAGAGG GAACCCCACG 4 20 

GCCCTCCAGC TCTTCCTTCA GATCTCAGAG GAGGAAGAGA AGGGTCCCAG AGGCGCTGGC 48 0 

ATTTCCGGGC GTAGCCAGGA GCAGAGTGTG AAGACAGTCC CCGGAGCCAG CGATCTTTTT 54 0 

TACTGGCCAA GGAGAGCCGG GAAACTCCAG GGTTCCCACC ACAAGCCCCT GTCCAAGACG 6 00 

GCCAATGGAC TGGCGGGGCA CGAAGGGTGG ACAATTGCAC TCCCGGGCCG GGCGCTGGCC 6 60 

CAGAATGGAT CCTTGGGTGA AGGAATCCAT GAGCCTGGGG GTCCCCGCCG GGGAAACAGC 72 0 

ACGAACCGGC GTGTGAGACT GAAGAACCCC TTCTACCCGC TGACCCAGGA GTCCTATGGA 78 0 

GCCTACGCGG TCATGTGTCT GTCCGTGGTG ATCTTCGGGA CCGGCATCAT TGGCAACCTG 0 40 

GCGGTGATGA GCATCGTGTG CCACAACTAC TACATGCGGA GCATCTCCAA CTCCCTCTTG 900 

GCCAACCTGG CCTTCTGGGA CTTTCTCATC ATCTTCTTCT GCCTTCCGCT GGTCATCTTC 96 0 

CACGAGCTGA CCAAGAAGTG GCTGCTGGAG GACTTCTCCT GCAAGATCGT GCCCTATATA 102 0 

GAGGTCGCTT CTCTGGGAGT CACCACTTTC ACCTTATGTG CTCTGTGCAT AGACCGCTTC 10 BO 

CGTGCTGCCA CCAACGTACA GATGTACTAC GAAATGATCG AAAACTGTTC CTCAACAACT 114 0 

GCCAAACTTG CTGTTATATG GGTGGGAGCT CTATTGTTAG CACTTCCAGA AGTTGTTCTC 1200 

CGCCAGCTGA GCAAGGAGGA TTTGGGGTTT AGTGGCCGAG CTCCGGCAGA AAGGTGCATT 126 0 

ATTAAGATCT CTCCTGATTT ACCAGACACC ATCTATGTTC TAGCCCTCAC CTACGACAGT 1320 

GCGAGACTGT GGTGGTATTT TGGCTGTTAC TTTTGTTTGC CCACGCTTTT CACCATCACC 13 8 0 

TGCTCTCTAG TGACTGCGAG GAAAATCCGC AAAGCAGAGA AAGCCTGTAC CCGAGGGAAT 144 0 

AAACGGCAGA TTCAACTAGA GAGTCAGATG AACTGTACAG TAGTGGCACT GACCATTTTA 1500 

TATGGATTTT GCATTATTCC TGAAAATATC TGCAACATTG TTACTGCCTA CATGGCTACA 156 0 

GGGGTTTCAC AGCAGACAAT GGACCTCCTT AATATCATCA GCCAGTTCCT TTTGTTCTTT 16 2 0 

AAGTCCTGTG TCACCCCAGT CCTCCTTTTC TGTCTCTGCA AACCCTTCAG TCGGGCCTTC 16 8 0 

ATGGAGTGCT GCTGCTGTTG CTGTGAGGAA TGCATTCAGA AGTCTTCAAC GGTGACCAGT 174 0 

GATGACAATG ACAACGAGTA CACCACGGAA CTCGAACTCT CGCCTTTCAG TACCATACGC 1800 

CGTGAAATGT CCACTTTTGC TTCTGTCGGA ACTCATTGCT GA 184 2 
(93) INFORMATION FOR SEQ ID N0:92: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 613 amino acids 

(B) TYPE: amino acid 
( C } S TRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(li) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

Met Arg Ala Pro Gly Ala Leu Leu Ala Arg Met Ser Arg Leu Leu Leu 
15 10 15 

Leu Leu Leu Leu Lys Val Ser Ala Ser Ser Ala Leu Gly Val Ala Pro 

20 25 30 

Ala Ser Arg Asn Glu Thr Cys Leu Gly Glu Ser Cys Ala Pro Thr Val 
35 40 45 

lie Gin Arg Arg Gly Arg Asp Ala Trp Gly Pro Gly Asn Ser Ala Arg 
50 55 GO 

Asp Val Leu Arg Ala Arg Ala Pro Arg Glu Glu Gin Gly Ala Ala Phe 
65 70 75 80 

Leu Ala Gly Pro Ser Trp Asp Leu Pro Ala Ala Pro Gly Arg Asp Pro 
85 90 95 

Ala Ala Gly Arg Gly Ala Glu Ala Ser Ala Ala Gly Pro Pro Gly Pro 
100 105 110 

Pro Thr Arg Pro Pro Gly Pro Trp Arg Trp Lys Gly Ala Arg Gly Gin 
115 120 125 

Glu Pro Ser Glu Thr Leu Gly Arg Gly Asn Pro Thr Ala Leu Gin Leu 
130 135 140 

Phe Leu Gin lie Ser Glu Glu Glu Glu Lys Gly Pro Arg Gly Ala Gly 
145 150 155 160 

lie Ser Gly Arg Ser Gin Glu Gin Ser Val Lys Thr Val Pro Gly Ala 
165 170 175 

Ser Asp Leu Phe Tyr Trp Pro Arg Arg Ala Gly Lys Leu Gin Gly Ser 
180 185 190 

His His Lys Pro Leu Ser Lys Thr Ala Asn Gly Leu Ala Gly His Glu 
195 200 205 

Gly Trp Thr He Ala Leu Pro Gly Arg Ala Leu Ala Gin Asn Gly Ser 
210 215 220 

Leu Gly Glu Gly He His Glu Pro Gly Gly Pro Arg Arg Gly Asn Ser 
225 230 235 240 
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Thr Asn Arg Arg Val Arg Leu Lys Asn Pro Phe 1^^^ P^'o Leu Thr Gin 
245 250 255 

GIu Ser Tyr Gly Ala Tyr Ala Val Met Cys Leu Ser Val Val He Phe 
260 265 270 

5 Gly Thr Gly He He Gly Asn Leu Ala Val Met Ser He Val Cys His 

275 280 285 

Asn Tyr Tyr Met Arg Ser He Ser Asn Ser Leu Leu Ala Asn Leu Ala 
290 295 300 

Phe Trp Asp Phe Leu He He Phe Phe Cys Leu Pro Leu Val He Phe 
10 305 310 315 320 

His Glu Leu Thr Lys Lys Trp Leu Leu Glu Asp Phe Ser Cys Lys He 
325 130 335 

Val Pro Tyr He Glu Val Ala Ser Leu Gly Val Thr Thr Phe Thr Leu 
340 345 350 

15 Cys Ala Leu Cys He Asp Arg Phe Arg Ala Ala Thr Asn Val Gin Met 

355 360 365 

Tyr Tyr Glu Met He Glu Asn Cys Ser Scr Thr Thr Ala Lys Leu Ala 
370 375 380 

Val He Trp Val Gly Ala Leu Leu Leu Ala Leu Pro Glu Val Val Leu 
20 385 390 395 400 

Arg Gin Leu Ser Lys Glu Asp Leu Gly Phe Ser Gly Arg Ala Pro Ala 
405 410 415 

Glu Arg Cys He He Lys He Ser Pro Asp Leu Pro Asp Thr He Tyr 
420 425 430 

25 Val Leu Ala Leu Thr Tyr Asp Ser Ala Arg Leu Trp Trp Tyr Phe Gly 

435 440 445 

Cys Tyr Phe Cys Leu Pro Thr Leu Phe Thr He Thr Cys Ser Leu Val 
450 455 460 

Thr Ala Arg Lys He Arg Lys Ala Glu Lys Ala Cys Thr Arg Gly Asn 
30 465 470 475 480 

Lys Arg Gin He Gin Leu Glu Ser Gin Ket Asn Cys Thr Val Val Ala 
485 490 495 

Leu Thr He Leu Tyr Gly Phe Cys He He Pro Glu Asn He Cys Asn 
500 505 510 



35 



He Val Thr Ala Tyr Met Ala Thr Gly Val Ser Gin Gin Thr Met Asp 
515 520 525 



Leu Leu Asn He He Ser Gin Phe Leu Leu Phe Phe Lys Ser Cys Val 
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530 535 540 

Thr Pro Val Leu Leu Phe Cys Leu Cys Lys Pro Phe Ser Arg Ala Phe 
545 550 555 560 

Met Glu Cys Cys Cys Cys Cys Cys Giu Glu Cys lie Gin Lys Ser Ser 
565 570 575 

Thr Val Thr Ser Asp Acp Asn Asp Acn Glu Tyr Thr Thr Glu Leu Glu 
580 585 590 

Leu Ser Pro Phe Ser Thr lie Arg Arg Glu Met Ser Thr Phe Ala Ser 
595 600 605 

Val Gly Thr His Cys 
610 

(94) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 4 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 93 : 
CAGAATTCAG AGAAAAAAAG TGAATATGGT TTTT 34 

(95) INFORMATION FOR SEQ ID NO : 94 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 
TTGGATCCCT GGTGCATAAC AATTGAAAGA AT 32 

(96) INFORMATION FOR SEQ ID NO : 95 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1248 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 

ATGGTTTTTG CTCACAGAAT GGATAACAGC AAGCC/^CATT TGATTATTCC TACACTTCTG 6 0 

GTGCCCCTCC AAAACCGCAG CTGCACTGAA ACAGCCACAC CTTCTGCCAAG CCAATACCTG 12 0 

ATGGAATTAA GTGAGGAGCA CAGTTGGA TG AGCAACCAAA CAGACCTTCA CTATGTGCTG 180 

AAACCCGGGG AAGTGGCCAC AGCCAGCATC TTCTTTGGGA TTCTGTGGTT GTTTTCTATC 24 0 

TTCGGCAATT CCCTGGTTTG TTTGGTCATC CATAGGAGTA GGAGGACTCA GTCTACCACC 3 00 

AACTACTTTG TGGTCTCCAT GGCATGTGCT GACCTTCTCA TCAGCGTTGC CAGCACGCCT 36 0 

TTCGTCCTGC TCCAGTTCAC CACTGGAAGG TGGACGCTGG GTAGTGCAAC GTGCAAGGTT 42 0 

GTGCGATATT TTCAATATCT CACTCCAGGT GTCCAGATCT ACGTTCTCCT CTCCATCTGC 4 80 

ATAGACCGGT TCTACACCAT CGTCTATCCT CTGAGCTTCA AvGGTGTCCAG AGAAAAAGCC 54 0 

AAGT^AAATGA TTGCGGCATC GTGGATCTTT GATGCAGGCT TTGTGACCCC TGTGCTCTTT 6 00 

TTCTATGGCT CCAACTGGGA CAGTCATTGT AACTATTTCC TCCCCTCCTC TTGGGAAGGC 66 0 

ACTGCCTACA CTGTCATCCA CTTCTTGGTG GGCTTTGTGA TTCCATCTGT CCTCATAAn^ 72 0 

TTATTTTACC AAAAGGTCAT AAAATATATT TGGAGAATAG GCACAGATGG CCGAACGGTG 78 0 

AGGAGGACAA TGAACATTGT CCCTCGGACA AAAGTGAAAA CTATCAAGAT GTTCCTCATT 84 0 

TTAAATCTGT TGTTTTTGCT CTCCTGGCTG CCTTTTCATG TAGCTCAGCT ATGGCACCCC 90 0 

CATGAACAAG ACTATAAGAA AAGTTCCCTT GTTTTCACAG CTATCACATG GATATCCTTT 96 0 

AGTTCTTCAG CCTCTAAACC TACTCTGTAT TCAATTTATA ATGCCAATTT TCGGAGAGGG 102 0 

ATGAAAGAGA CTTTTTGCAT GTCCTCTATG AAATGTTACC GAAGC7VATGC CTATACTATC 108 0 

ACAACAAGTT CAAGGATGGC CAAAAAAAAC TACGTTGGCA TTTCAGAAAT CCCTTCCATG 114 0 

GCCAAAACTA TTACCAAAGA CTCGATCTAT GACTCATTTG ACAGAGAAGC CAAGGAAAAA 12 00 

AAGCTTGCTT GGCCCATTAA CTCAAATCCA CC AAATACTT TTGTCTAA 124 8 
(97) INFORMATION FOR SEQ ID NO : 96 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 415 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 96 : 
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Met Val Phe Ala His 
1 5 

Pro Thr Leu Leu Val 
20 

Thr Pro Leu Pro Ser 

35 

Trp Met Ser Asn Gin 
50 

Val Ala Thr Ala Ser 
65 

Phe Gly Asn Ser Leu 
B5 

Gin Ser Thr Thr Asn 
100 

Leu lie Ser Val Ala 
115 

Gly Arg Trp Thr Leu 
130 

Gin Tyr Leu Thr Pro 
145 

lie Asp Arg Phe Tyr 
165 

Arg Glu Lys Ala Lys 

180 

Gly Phe Val Thr Pro 

195 

His Cys Asn Tyr Phe 
210 

Val He His Phe Leu 
225 

Leu Phe Tyr Gin Lys 
245 

Gly Arg Thr Val Arg 
260 

Lys Thr He Lys Met 

275 



Arg Met Asp Acn Ser Lys 
10 

Pro Leu Gin Asn Arg Ser 
25 

Gin Tyr Leu Met Glu Leu 
40 

Thr AGp Leu His Tyr Val 
55 

lie Phe Phe Gly lie Leu 
70 75 

Val Cys Leu Val He His 

90 

Tyr Phe Vc-al Val Ser Met 
IDS 

Ser Thr Pro Phe Val Leu 
120 

Gly Ser Ala Thr Cys Lys 
135 

Gly Val Gin He Tyr Val 
150 155 

Thr He Val Tyr Pro Leu 
170 

Lys Met He Ala Ala Ser 
185 

Val Leu Phe Phe Tyr Giy 
200 

Leu Pro Ser Ser Trp Glu 
215 

Val Gly Phe Val He Pro 
230 235 

Val He Lys Tyr He Trp 
250 

Arg Thr Met Asn He Val 
265 

Phe Leu He Leu Asn Leu 
280 



Pro His Leu He He 
15 

Cys Thr Glu Thr Ala 
30 

Ser Glu Glu His Ser 
45 

Leu Lys Pro Gly Glu 
60 

Trp Leu Phe Ser He 
80 

Arg Ser Arg Arg Thr 
95 

Ala Cys Ala Asp Leu 
110 

Leu Gin Phe Thr Thr 
125 

Val Val Arg Tyr Phe 
140 

Leu Leu Ser He Cys 
160 

Scr Phe Lys Val Ser 
175 

Trp He Phe Asp Ala 
190 

Ser Asn Trp Asp Ser 
205 

Gly Thr Ala Tyr Thr 

220 

Ser Val Leu He He 
240 

Arg He Gly Thr Asp 
255 

Pro Arg Thr Lys Val 
270 

Leu Phe Leu Leu Ser 
285 
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Trp Leu Pro Phe 
290 

Tyr Lys hys Ser 
305 

Ser Ser Ser Ala 



Phe Arg Arg Gly 
340 

Tyr Arg Ser Asn 
355 

Lys Asn Tyr Val 
370 

Thr Lys Asp Ser 

385 

Lys Leu Ala Trp 



(98) INFORMATION FOR 



His Val Ala Gin 
295 

Ser Leu Val Phe 
310 

Ser Lys Pro Thr 
325 

Met Lys Glu Thr 



Ala Tyr Thr lie 
360 

Gly lie S^.r Glu 
375 

He Tyr Asp Ser 
390 

Pro He Asn Ser 
405 

SEQ ID NO: 97 : 



Leu Trp His Pro 
300 

Thr Ala He Thr 
315 

Leu Tyr Ser He 
330 

Phe Cys Met Ser 
345 

Thr Thr Ser Ser 



He Pro Ser Met 
380 

Phe Asp Arg Glu 
395 

Asn Pro Pro Asn 
410 



His Glu Gin Asp 



Trp He Ser Phe 
320 

Tyr Asn Ala Asn 
335 

Ser Met Lys Cys 
350 

Arg Met Ala Lys 
365 

Ala Lys Thr He 



Ala Lys Glu Lys 
400 

Thr Phe Val 
415 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 



(li) MOLECULE TYPE: DNA (genomic) 
(xx} SEQUENCE DESCRIPTION: SEQ ID NO: 97; 



GGAAAGCTTA ACGATCCCCA GGAGCAACAT 
(99) INFORMATION FOR SEQ ID NO: 98: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 



CTGGGATCCT ACGAGAGCAT TTTTCACACA G 
(10 0) INFORMATION FOR SEQ ID NO: 99: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 1842 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 

ATGGGGCCCA CCCTAGCGGT TCCCACCCCC TATGGCTGTA TTGGCTGTAA GCTACCCCAG 6 0 

CCAGAATACC CACCGGCTCT AATCATCTTT ATGTTCTGCG CGATGGTTAT CACCATCGTT 12 0 

GTAGACCTAA TCGGCAACTC CATGGTCATT TTGGCTGTGA CGAAGAACAA GAAGCTCCGG 180 

AATTCTGGCA ACATCTTCGT GGTCAGTCTC TCTGTGGCCG ATATGCTGGT GGCCATCTAC 7.4 0 

CCATACCCTT TGATGCTGCA TGCCATGTCC ATTGGGGGCT GGGATCTGAG CCAGTTACAG 3 00 

TGCCAGATGG TCGGGTTCAT CACAGGGCTG AGTGTGGTCG GCTCCATCTT CAACATCGTG 36 0 

GCAATCGCTA TCAACCGTTA CTGCTACATC TGCCACAGCC TCCAGTACGA ACGGATCTTC 42 0 

AGTGTGCGCA ATACCTGCAT CTACCTGGTC ATCACCTGGA TCATGACCGT CCTGGCTGTC 480 

CTGCCCAACA TGTACATTGG CACCATCGAG TACGATCCTC GCACCTACAC CTGCATCTTC 54 0 

AACTATCTGA ACAACCCTGT CTTCACTGTT ACCATCGTCT GCATCCACTT CGTCCTCCCT 6 00 

CTCCTCATCG TGGGTTTCTG CTACGTGAGG ATCTGGACCA AAGTGCTGGC GGCCCGTGAC 66 0 

CCTGCAGGGC AGAATCCTGA CAACCAACTT GCTGAGGTTC GCAATTTTCT AACCATGTTT 72 0 

GTGATCTTCC TCCTGTTTGC AGTGTGCTGG TGCCCTATCA ACGTGCTCAC TGTCTTGGTG 78 0 

GCTGTCAGTC CGAAGGAGAT GGCAGGCAAG ATCCCCAACT GGCTTTATCT TGCAGCCTAC 84 0 

TTCATAGCCT ACTTCAACAG CTGCCTCAAC GCTGTGATCT ACGGGCTCCT CAATGAGAAT 90 0 

TTCCGAAGAG AATACTGGAC CATCTTCCAT GCTATGCGGC ACCCTATCAT ATTCTTCCCT 96 0 

GGCCTCATCA GTGATATTCG TGAGATGCAG GAGGCCCGTA CCCTGGCCCG CGCCCGTGCC 102 0 

CATGCTCGCG ACCAAGCTCG TGAACAAGAC CGTGCCCATG CCTGTCCTGC TGTGGAGGAA 1080 

ACCCCGATGA ATGTCCGGAA TGTTCCATTA CCTGGTGATG CTGCAGCTGG CCACCCCGAC 114 0 

CGTGCCTCTG GCCACCCTAA GCCCCATTCC AGATCCTCCT CTGCCTATCG CAAATCTGCC 1200 

TCTACCCACC ACAAGTCTGT CTTTAGCCAC TCCAAGGCTG CCTCTGGTCA CCTCAAGCCT 126 0 

GTCTCTGGCC ACTCCAAGCC TGCCTCTGGT CACCCCAAGT CTGCCACTGT CTACCCTAAG 132 0 

CCTGCCTCTG TCCATTTCAA GGGTGACTCT GTCCATTTCA AGGGTGACTC TGTCCATTTC 13 8 0 
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AAGCCTGACT CTGTTCATTT CAAGCCTGCT TCCAGCAACC CCAAGCCCAT CACTGGCCAC 144 0 

CATGTCTCTG CTGGCAGCCA CTCCAAGTCT GCCTTCAGTG CTGCCACCAG CCACC CTAAA 1500 

CCCATCA;^GC CAGCTACCAG CCATGCTGAG CCCACCACTG CTGACTATCC CAAGCCTGCC 15C0 

ACTACCAGCC ACCCTAAGCC CGCTGCTGCT GACAACCCTG AGCTCTCTGC CTCCCATTGC 162 0 

CCCGAGATCC CTGCCATTGC CCACCCTGTG TCTGACGACA GTGACCTCCC TGAGTCGGCC 168 0 

TCTAGCCCTG CCGCTGGGCC CACCAAGCCT GCTGCCAGCC AGCTGGAGTC TGACACCATC 174 0 

GCTGACCTTC CTGACCCTAC TGTAGTCACT ACCAGTACCA ATGATTACCA TGATGTCGTG IBOO 

GTTGTTGATG TTGAAGATGA TCCTGATGAA ATGGCTGTGT GA 184 2 
(101) INFORMATION FOR SEQ ID NO: 10 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 613 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 

(jLi) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ TD NO; 100: 

Met Gly Pro Thr Leu Ala Val Pro Thr Pro Tyr Gly Cys lie Gly Cys 
IB 10 15 

Lys Leu Pro Gin Pro Glu Tyr Pro Pro Ala Leu lie lie Phe Het Phe 
20 25 30 

Cys Ala Met Val lie Thr lie Val Val Asp Leu lie Gly Asn Ser Met 
35 40 45 

Val lie Leu Ala Val Thr Lys Ann Lys Lys Leu Arg Asn Ser Gly Asn 
50 55 60 

lie Phe Val Val Ser Leu Ser Val Ala Asp Met Leu Val Ala lie Tyr 
65 70 75 80 

Pro Tyr Pro Leu Met Leu His Ala Met Ser lie Gly Gly Trp Asp Leu 
85 90 95 

Ser Gin Leu Gin Cys Gin I^et Val Gly Phe lie Thr Gly Leu Ser Vai 
100 105 110 

Val Gly Ser lie Phe Asn lie Val Ala lie Ala lie Asn Arg Tyr Cys 
115 120 125 

Tyr lie Cys His Ser Leu Gin Tyr Glu Arg lie Phe Ser Val Arg Asn 
130 135 140 
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Thr CyG He Tyr Leu Val lie Thr Trp Tie Met Thr Val Leu Ala Val 
145 150 155 160 

Leu Pro Asn Met Tyr He Gly Thr He Glu Tyr Asp Pro Arg Thr Tyr 
165 170 175 

Thr Cys He Phe Asn Tyr Leu Asn Asn Pro Val Phe Thr Val Thr He 
180 185 190 

Val Cys He His Phe Val Leu Pro Leu Leu He Val Gly Phe Cys Tyr 
195 200 205 

Val Arg He Trp Thr Lys Val Leu Ala Ala Arg Asp Pro Ala Gly Gin 
210 215 220 

Asn Pro Asp Asn Gin Leu Ala Glu Val Arq Asn Phe Leu Thr Met Phe 
225 230 235 240 

Val He Pile Leu Leu Phe Ala Val Cys Trp Cys Pro He Asii Val Leu 
245 250 255 

Thr Val Leu Val Ala Val Ser Pro Lys Glu Met Ala Gly Lys He Pro 
260 265 270 

Asn Trp Leu Tyr Leu Ala Ala Tyr Phe He Ala Tyr Phe Asn Ser Cys 
275 280 2B5 

Leu Asn Ala Val He Tyr Gly Leu Leu Asn Glu Asn Phe Arg Arg Glu 
290 295 300 

Tyr Trp Thr He Phe His Ala Met Arq ll±s Pro He He Phe Phe Pro 
305 310 315 320 

Gly Leu He Ser Asp He Arg Glu Met Gin Glu Ala Arg Thr Leu Ala 
325 330 335 

Arg Ala Arg Ala His Ala Arg Asp Gin Ala Arg Glu Gin Asp Arg Ala 
340 345 350 

Kis Ala Cys Pro Ala Val Glu Glu Thr Pro Met Asn Val Arg Asn Val 
355 360 365 

Pro Leu Pro Gly Asp Ala Ala Ala Gly His Pro Asp Arg Ala Ser Gly 
370 375 380 

His Pro Lys Pro His Ser Arg Ser Ser Ser Ala Tyr Arg Lys Ser Ala 
385 390 395 400 

Ser Thr His His Lys Ser Val Phe Ser His Ser Lys Ala Ala Ser Gly 
405 410 415 

His Leu Lys Pro Val Ser Gly His Ser Lys Pro Ala Ser Gly Has Pro 
420 425 430 



Lys Ser Ala Thr Val Tyr Pro Lys Pro Ala Ser Val His Phe Lys Gly 
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435 440 445 

Asp Ser Val His Phe I.ys Gly Asp Ser Val His Phe Lyn Pro Asp Ser 
450 455 460 

Val His Phe Lys Pro Ala Ser Ser Asn Pro Lys Pro lie Thr Gly His 
465 470 475 480 

His Val Ser Ala Gly Ser His Ser Lys Ser Ala Phe Ser Ala Ala Thr 
485 490 495 

Ser His Pro Lys Pro lie Lys Pro Ala Thr Ser His Ala Glu Pro Thr 
500 505 510 

Thr Ala Asp Tyr Pro Lys Pro Ala Thr Thr Ser His Pro Lys Pro Ala 
515 520 525 

Ala Ala Asp Asn Pro Glu Leu Ser Ala Ser His Cys Pro Glu He Pro 
530 535 540 

Ala He Ala His Pro Val Ser Asp Asp Ser Asp Leu Pro Glu Ser Ala 
545 550 555 560 

Ser Ser Pro Ala Ala Gly Pro Thr Lys Pro Ala Ala Ser Gin Leu Glu 
565 570 575 

Ser Asp Thr He Ala Asp Leu Pro Asp Pro Thr Val Val Thr Thr Ser 
580 585 590 

Thr Asn Asp Tyr His Asp Val Val Val Val Asp Val Glu Asp Asp Pro 
595 6O0 605 

Asp Glu Met Ala Val 

610 

(102) INFORMATION FOH SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:101: 
TCCAAGCTTC GCCATGGGAC ATAACGGGAG CT 32 

(103) INFORMATION FOR SEQ ID NO: 102: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(O) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:102: 
CGTGAATTCC AAGAATTTAC AATCCTTGCT 
(104) INFORMATION FOR SEQ ID NO: 103: 



30 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

{ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 

ATGGGACATA ACGGGAGCTG GATCTCTCCA AATGCCAGCG AGCCGCACAA CGCGTCCGGC 6 0 

GCCGAGGCTG CGGGTGTGAA CCGCAGCGCG CTCGGGGAGT TCGGCGAGGC GCAGCTGTAC 12 0 

CGCCAGTTCA CCACCACCGT GCAGGTCGTC ATCTTCATAG GCTCGCTGCT CGGAAACITC 18 0 

ATGGTGTTAT GGTCAACTTG CCGCACAACC GTGTTCAAAT CTGTCACCAA CAGGTTCATT 24 0 

AAAAACCTGG CCTGCTCGGG GATTTGTGCC AGCCTGGTCT GTGTGCCCTT CGACATCATC 30 0 

CTCAGCACCA GTCCTCACTG TTGCTGGTGG ATCTACACCA TGCTCTTCTG CAAGGTCGTC 36 0 

AAATTTTTGC ACAAAGTATT CTGCTCTGTG ACCATCCTCA GCTTCCCTGC TATTGCTTTG 42 0 

GACAGGTACT ACTCAGTCCT CTATCCACTG GAGAGGAAAA TATCTGATGC CAAGTCCCGT 48 0 

GAACTGGTGA TGTACATCTG GGCCCATGCA GTGGTGGCCA GTGTCCCTGT GTTTGCAGTA 54 0 

ACCAATGTGG CTGACATCTA TGCCACGTCC ACCTGCACGG AAGTCTGGAG CAACTCCTTG 6 0C 

GGCCACCTGG TGTACGTTCT GGTGTATAAC ATCACCACGG TCATTGTGCC TGTGGTGGTG 660 

GTGTTCCTCT TCTTGATACT GATCCGACGG GCCCTGAGTG CCAGCCAGAA GAAGAAGGTC 72 0 

ATCATAGCAG CGCTCCGGAC CCCACAGAAC ACCATCTCTA TTCCCTATGC CTCCCAGCGG 780 

GAGGCCGAGC TGCACGCCAC CCTGCTCTCC ATGGTGATGG TCTTCATCTT GTGTAGCGTG 84 0 

CCCTATGCCA CCCTGGTCGT CTACCAGACT GTGCTCAATG TCCCTGACAC TTCCGTCTTC 900 

TTGCTGCTCA CTGCTGTTTG GCTGCCCAAA GTCTCCCTGC TGGCAAACCC TGTTCTCTTT 96 0 

CTTACTGTGA ACAAATCTGT CCGCAAGTGC TTGATAGGGA CCCTGGTGCA ACTACACCAC 102 0 

CGGTACAGTC GCCGTAATGT GGTCAGTACA GGGAGTGGCA TGGCTGAGGC CAGCCTGGAA 108O 
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CCCAGCATAC GCTCGGGTAG CCAGCTCCTG GAGATGTTCC ACATTGGGCA GCAGCAGATC 114 C 

TTTAAGCCCA CAGAGGATGA GGAAGAGAGT GAGGCCAAGT ACATTGGCTC AGCTGACTTl^ 12 00 

CAGGCCAAGG AGATATTTAG CACCTGCCTC GAGGGAGAGC AGGGGCCACA GTTTGCGCCC 1260 

TCTGCCCCAC CCCTGAGCAC AGTGGACTCT GTATCCCAGG TGGCACCGGC AGCCCCTGTG 132C 

GAACCTGAAA CATTCCCTGA TAAGTATTCC CTGCAGTTTG GCTTTGGGCC TTTTGAGTTG 13 8 0 

CCTCCTCAGT GGCTCTCAGA GACCCGAAAC AGCAAGAAGC GGCTGCTTCC CCCGTTGGGC 14 4 0 

AACACCCCAG AAGAGCTGAT CCAGACAAAG GTGCCCAAGG TAGGCAGGGT GGAGCGGAAG 1500 

ATGAGCAGAA ACAATAAAGT GAGCATTTTT CCAAAGGTGG ATTCCTAG 154 8 
(105) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 515 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 

Met Gly His Asn Gly Ser Trp He Ser Pro Asn Ala Ser Glu Pro Hin 
^5 10 15 

Asn Ala Ser Gly Ala Glu Ala Ala Gly Val Asn Arg Ser Ala Leu Gly 
20 25 30 

Glu Phe Gly Glu Ala Gin Leu Tyr Arg Gin Phe Thr Thr Thr Val Gin 
35 40 45 

Val Val He Phe He Gly Ser Leu Leu Gly Asn Phe Met Val Leu Trp 
50 55 60 

ser Thr Cys Arg Thr Thr Val Phe Lys Ser Val Thr Asn Arg Phe He 
65 70 75 8Q 

Lys Asn Leu Ala Cys Ser Gly He Cys Ala Ser Leu Val Cys Val Pro 
85 90 95 

Phe Asp He He Leu Ser Thr Ser Pro His Cys Cys Trp Trp He Tyr 
100 105 110 

Thr Met Leu Phe Cys Lys Val Val Lys Phe Leu His Lys Val Phe Cys 
115 120 125 

Ser Val Thr He Leu Ser Phe Pro Ala He Ala Leu Asp Arg Tyr Tyr 
130 135 140 
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Ser Val Leu Tyr Pro Leu Glu Arg Lys He Ser Asp Ala Lys Ser Arg 
1-^5 150 155 160 

Glu Leu Val Met Tyr He Trp Ala His Ala Val Val Ala Ser Val Pro 
165 170 175 

Val Phe Ala Val Thr Asn Val Ala Asp Tie Tyr Ala Thr Ser Thr Cys 
180 185 190 

Thr Glu Val Trp Ser Asn Ser Leu Gly His Leu Val Tyr Val Leu Val 
195 200 205 

Tyr Asn He Thr Thr Val He Val Pro Val Val Val Val Phe Leu Phe 
210 215 220 

Leu He Leu He Arg Arg Ala Leu Ser Ala Ser Gin Lys Lys Lys Val 
225 230 235 ' 240 

He He Ala Ala Leu Arg Thr Pro Gin Asn Thr He Ser He Pro Tyr 
245 250 255 

Ala Ser Gin Arg Glu Ala Glu Leu His Ala Thr Leu Leu Ser Met Val 
260 265 270 

Met Val Phe He Leu Cys Ser Val Pro Tyr Ala Thr Leu Val Val Tyr 
275 280 285 

Gin Thr Val Leu Asn Val Pro Asp Thr Ser Val Phe Leu Leu Leu Thr 
290 295 30C 

Ala Val Trp Leu Pro Lys Val Ser Leu Leu Ala Asn Pro Val Leu Phe 
305 310 315 320 

Leu Thr Val Asn Lys Ser Val Arg Lys Cys Leu He Gly Thr Leu Val 
325 330 335 

Gin Leu His His Arg Tyr Ser Arg Arg Asn Val Val Ser Thr Gly Ser 
340 345 350 

Gly Met Ala Glu Ala Ser Leu Glu Pro Ser He Arc Ser Gly Ser Gin 
355 360 365 

Leu Leu Glu Met Phe His He Gly Gin Gin Gin He Phe Lys Pro Thr 
370 375 380 

Glu Asp Glu Glu Glu Ser Glu Ala Lys Tyr He Gly Ser Ala Asp Phe 
385 390 395 400 

Gin Ala Lys Glu He Phe Ser Thr Cys Leu Glu Gly Glu Gin Gly Pro 
405 410 415 

Gin Phe Ala Pro Ser Ala Pro Pro Leu Ser Thr Val Asp Ser Val Ser 
420 425 430 



Gin Val Ala Pro Ala Ala Pro Val Glu Pro Glu Thr Phe Pro Asp Lys 
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435 440 445 

Tyr Scr Leu Gin Phe Gly Phe Gly Pro Piie Glu Leu Pro Pro Gin Trp 
450 455 460 

Leu Ser Glu Thr Arg Asn Ser Lys Lys Arg Leu Leu Pro Pro Leu Gly 
465 470 475 480 

Asn Thr Pro Glu Glu Leu He Gin Thr Lys Val Pro Lys Val Gly Arg 
485 490 495 

Val Glu Arg Lyn Met Ser Arg Asn Asn Lys Val Ser He Phe Pro Lys 
500 505 510 

Vai Asp Ser 
515 

(106 J INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 10 5: 
GGAGAATTCA CTAGGCGAGG CGCTCCATC ; 

(107) INFORMATION FOR SEQ ID N0:106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 
GGAGGATCCA GGAAACCTTA GGC CG AGTCC 

(108) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1164 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 

ATGAATCGGC ACCATCTGCA GGATCACTTT CTGGAAATAG ACAAGAAG/iA CTGCTGTGTG 6 0 

TTCCGAGATG ACTTCATTGC CAAGGTGTTG CCGCCGGTGT TGGGGCTGGA GTTTATCTTT 12 0 

GGGCTTCTGG GCAATGGCCT TGCCCTGTGG ATTTTCTGTT TCCACCTCAA GTCCTGGAAA 18 0 

5 TCCAGCCGGA TTTTCCTGTT CAACCTGGCA GTAGCTGACT TTCTACTGAT CATCTGCCTG 24 0 

CCGTTCGTGA TGGACTACTA TGTGCGGCGT TCAGACTGGA ACTTTGGGGA CATCCCTTGC iOO 

CGGCTGGTGC TCTTCATGTT TGCCATGAAC CGCCAGGGCA GCATCATCTT CCTCACGGTG 36 0 

GTGGCGGTAG ACAGGTATTT CCGGGTGGTC CATCCCCACC ACGCCCTGAA CAAGATCTCC 42 0 

AATTGGACAG CAGCCATCAT CTCTTGCCTT CTGTGGGGCA TCACTGTTGG CCTAACAGTC 4 80 

0 CACCTCCTGA AGAAGAAGTT GCTGATCCAG AATGGCCCTG CAAATGTGTG CATCAGCTTC 54 0 

AGCATCTGCC ATACCTTCCG GTGGCACGAA GCTATGTTCC TCCTGGAGTT CCTCCTGCCC 6 0O 

CTGGGCATCA TCCTGTTCTG CTCAGCCAGA ATTATCTGGA GCCTGCGGCA GAGACAAATG 66 0 

GACCGGCATG CCAAGATCAA GAGAGCCATC ACCTTCATCA TGGTGGTGGC CATCGTCTTT 72 0 

GTCATCTGCT TCCTTCCCAG CGTGGTTGTG CGGATCCGCA TCTTCTGGCT CCTGCACACT 78 0 

TCt^GCACGC AGAATTGTGA AGTGTACCGC TCGGTGGACC TGGCGTTCTT TATCACTCTC 84 0 

AGCTTCACCT ACATGAACAG CATGCTGGAC CCCGTGGTGT ACTACTTCTC CAGCCCATCC 9 00 

TTTCCCAACT TCTTCTCCAC TTTGATCAAC CGCTGCCTCC AGAGGAAGAT GACAGGTGAG 960 

CCAGATAATA ACCGCAGCAC GAGCGTCGAG CTCACAGGGG ACCCCAACAA 7VACCAGAGGC 10::0 

CCTCCAGAGG CGTTAATGGC CAACTCCGGT GAGCCATGGA GCCCCTCTTA TCTGGGCCCA 108 0 

ACCTCAAATA ACCATTCCAA GAAGGGACAT TGTCACCAAG AACCAGCATC TCTGGAGAAA 114 0 

CAGTTGGGCT GTTGCATCGA GTAA ^-j^^^ 
(109) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 387 amino acids 

(B) TYPE: amino acid 

(C) STR7WDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 
Met Asn Arg His His Leu Gin Asp His Phe Leu Glu lie Asp Lys Lys 
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15 10 15 

Asn Cys Cyc Val Phe Arg Asp Asp Phe lie Ala Lys Val Leu Pro Pro 
20 25 30 

Val Lou Gly Leu Glu Phe He Phe Gly Leu Leu Gly Asn Gly Leu Ala 
35 40 45 

Leu Trp He Phe Cys Phe His Leu Lys Ser Trp Lys Ser Ser Arg He 
50 55 60 

Phe Leu Phe Asn Leu Ala Val Ala Asp Phe Leu Leu He He Cys Leu 
65 70 75 80 

Pro Phe Val Met Asp Tyr Tyr Val Arg Arg Ser Asp Trp Asn Phe Gly 
85 90 95 

Asp He Pro Cys Arg Leu Val Leu Phe Met Phe Ala Met Asn Arg Gin 
100 105 110 

Gly Ser He He Phe Leu Thr Val Val Ala Val Asp Arg Tyr Phe Arg 
115 120 125 

Val Val His Pro His His Ala Leu Asn Lys He Ser Asn Trp Thr Ala 
130 135 140 

Ala He He Ser Cys Leu Leu Trp Gly He Thr Val Giy Leu Thr Val 
145 150 155 160 

His Leu Leu Lys Lys Lys Leu Leu He Gin Asn Gly Pro Ala Asn Val 
165 170 175 

Cys He Ser Phe Ser He Cys His Thr Phe Arg Trp His Glu Ala Met 
180 185 190 

Phe Leu Leu Glu Phe Leu Leu Pro Leu Gly He He Leu Phe Cys Ser 
195 200 205 

Ala Arg He He Trp Ser Leu Arg Gin Arg Gin Met Asp Arg His Ala 
210 215 220 

Lys He Lys Arg Ala He Thr Phe He Met Val Val Ala He Val Phe 
225 230 235 240 

Val He Cys Phe Leu Pro Ser Val Val Val Arg He Arg He Phe Trp 
245 250 255 

Leu Leu His Thr Ser Gly Thr Gin Asn Cys Glu Val Tyr Arg Ser Val 
260 265 270 

Asp Leu Ala Phe Phe He Thr Leu Ser Phe Thr Tyr Met Asn Ser Met 
275 280 285 



Leu Asp Pro Val Val Tyr Tyr Phe Ser Ser Pro Ser Phe Pro Asn Phe 
290 295 300 
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Phe Ser Thr 
305 



Leu 



lie 



Asn 



310 



Arg Cys Leu Gin Arg Lys Met Thr Gly Glu 
315 320 



Pro Asp Asn 



Asn 



Arg 
325 



Ser 



Thr Ser Val Glu Leu Thr Gly Asp Pro Asn 
330 335 



5 



Lys Thr Arg 



Gly 



Ala 



Pro 



Glu Ala Leu Met Ala Asn Ser Gly Glu Pro 
345 350 



340 



Trp Ser Pro Ser Tyr Leu Gly Pro Thr Ser Asn Asn His Ser Lys Lys 
355 360 365 

Gly His Cys His Gin Glu Pro Ala Ser Leu Glu Lys Gin Leu Gly Cys 
0 370 375 360 

Cys lie Glu 
385 

(110) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 9: 
ACCATGGCTT GCAATGGCAG TGCGGCCAGG GGGCACT 

(111) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 9 base pairs 
(B} TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DIvA (genomic) 
(iv) ANTI -SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 
CGACCAGGAC AAACAGCATC TTGGTCACTT GTCTCCGGC 

(112) INFORMATION FOR SEQ ID NO: 111: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDfJESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
3 (iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 
GACCAAGATG CTGTTTGTCC TGGTCGTGGT GTTTGGCAT 

(113) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS: 
0 (A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 
5 (iv) ANTI- SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 
CGGAATTCAG GATGGATCGG TCTCTTGCTG CGCCT 

(114) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1212 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0;113; 



ATGGCTTGCA ATGGCAGTGC GGCCAGGGGG 
GACGAGGCAC TGAGACTCAA GTACCTGGGG 
TGTGCCACAT ACCTGCTGAT CTTCGTGGTG 
GTCATCCTGC GCCACAAGGC CATGCGCACG 
GTGTCGGACC TGCTGGTGCT GCTGGTGGGC 
AACTACCCCT TCCTGCTGGC CGTTGGTGGC 
GTCTGCCTGG CCTCAGTGCT CAACGTCACT 



CACTTTGACC CTGAGGACTT GAACCTGACT 60 

CCCCAGCAGA CAGAGCTGTT CATGCCCATC 120 

GGCGCTGTGG GCAATGGGCT GACCTGTCTG 180 

CCTACCAACT ACTACCTCTT CAGCCTGGCC 240 

CTGCCCCTGG AGCTCTATGA GATGTGGCAC 3 00 

TGCTATTTCC GCACGCTACT GTTTGAGATG 3 60 

GCCCTGAGCG TGGAACGCTA TGTGGCCGTG 420 



GTGCACCCAC TCCAGGCCAG GTCCATGGTG ACGCGGGCCC ATGTGCGCCG AGTGCTTGGG 480 
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GCCGTCTG&:5 GTCTTGCCAT GCTCTGCTCC CTGCCCAACA CCAGCCTGCA CGGCATCCGG 54 0 

CAGCTGCACG TGCCCTGCCG GGGCCCAGTG CCAGACTCAG CTGTTTGCAT GCTGGTCCGC 6 00 

CCACGGGCCC TCTACAACAT GGTAGTGCAG ACCACCGCGC TGCTCTTCTT CTGCCTGCCC 660 

ATGGCCATCA TGAGCGTGCT CTACCTGCTC ATTGGGCTGC GACTGCGGCG GGAGAGGCTG 720 

CTGCTCATGC AGGAGGCCAA GGGCAGGGGC TCTGCAGCAG CCAGGTCCAG ATACACCTGC 780 

AGGCTCCAGC AGCACGATCG GGGCCGGAGA CAAGTGACCA AGATGCTGTT TGTCCTGGTC 84 0 

GTGGTGTTTG GCATCTGCTG GGCCCCGTTC CACGCCGACC GCGTCATGTG GAGCGTCGTG 900 

TCACAGTGGA CAGATGGCCT GCACCTGGCC TTCCAGCACG TGCACGTCAT CTCCGGCATC 96 0 

TTCTTCTACC TGGGCTCGGC GGCCAACCCC GTGCTCTATA GCCTCATGTC CAGCCGCTTC 102 0 

CGAGAGACCT TCCAGGhGQC CCTGTGCCTC GGGGCCTGCT GCCATCGCCT CAGACCCCGC 108 0 

CACAGCTCCC ACAGCCTCAG CAGGATGACC ACAGGCAGCA CCCTGTGTGA TGTGGGCTCC 114 0 

CTGGGCAGCT GGGTCCACCC CCTGGCTGGG AACGATGGCC CAGAGGCGCA GCAAGAGACC 1200 

GATCCATCCT GA 2212 
(115) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 
(A} LENGTH: 403 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID K0:114: 

Met Ala Cys Asn Gly Sex Ala Ala Arg Gly His Phe Asp Pro Glu Asp 
J 5 10 15 

Leu Asn Leu Thr Asp Glu Ala Leu Arg Leu Lys Tyr Leu Gly Pro Gin 
20 25 30 

Gin Thr Glu Leu Phe Met Pro He Cys Ala Thr Tyr Leu Lgu He Phe 
35 40 45 

Val Val Gly Ala Val Gly Asn Gly Leu Thr Cys Leu Val He Leu Arg 
50 55 60 

His Lys Ala Met Arg Thr Pro Thr Acn Tyr Tyr Leu Phe Ser Leu Ala 
65 70 75 80 

Val Ser Asp Leu Leu Val Leu Leu Val Gly Leu Pro Leu Glu Leu Tyr 
85 90 95 
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Glu Met Trp Ilis Asn Tyr Pro Phe Leu Leu Gly Val Gly Gly Cys Tyr 
100 105 110 

Phe Arg Thr Leu Leu Phe Glu Met Val Cys Leu Ala Ser Val Leu Asn 
115 120 125 

Val Thr Ala Leu Ser Val Glu Arg Tyr Val Ala Val Val His Pro Leu 
130 135 140 

Gin Ala Arg Ser Met Val Thr Arg Ala His Val Arg Arg Val Leu Gly 
I'iS 150 155 160 

Ala Val Trp Gly Leu Ala Met Leu Cys Ser Leu Pro Asn Thr Ser Leu 
165 170 175 

His Gly lie Arg Gin Leu Hin Val Pro Cys Arg Gly Pro Val Pro Asp 
180 185 190 

Ser Ala Val Cys Met Leu Val Arg Pro Arg Ala Leu Tyr Asn Met Val 
19^ 200 205 

Val Gin Thr Thr Ala Leu Leu Phe Phe Cys Leu Pro Met Ala He Met 
210 215 220 

Ser Val Leu Tyr Leu Leu He Gly Leu Arg Leu Arg Arg Glu Arg Leu 
225 230 235 240 

Leu Leu Met Gin Glu Ala Lys Gly Arg Gly Ser Ala Ala Ala Arg Ser 
245 250 255 

Arg Tyr Thr Cys Arg Leu Gin Gin His Asp Arg Gly Arg Arg Gin Val 
260 265 270 

Thr Lys Met Leu Phe Val Leu Val Val Val Phe Gly He Cys Trp Ala 
275 280 285 

Pro Phe His Ala Asp Arg Val Met Trp Ser Val Val Ser Gin Trp Thr 
290 295 300 

Asp Gly Leu His Leu Ala Phe Gin His Val Hie Val He Ser Gly He 
305 310 315 320 

Phe Phe Tyr Leu Gly Ser Ala Ala Asn Pro Val Leu Tyr Ser Leu Met 
325 330 335 

Ser Ser Arg Phe Arg Glu Thr Phe Gin Glu Ala Leu Cys Leu Gly Ala 
340 345 350 

CyG Cys His Arg Leu Arg Pro Arg His Ser Ser His Ser Leu Ser Arg 
355 360 365 

Met Thr Thr Gly Ser Thr Leu Cys Asp Val Gly Ser Leu Gly Ser Trp 
370 375 380 



Val His Pro Leu Ala Gly Asn Asp Gly Pro Glu Ala Gin Gin Glu Thr 
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385 390 39S 40C 

Asp Pro Ser 

(116) INFORMATION FOR SEQ ID NO: 115: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDHESS : £3ingle 

(D) TOPOLOGY: linear 

0 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 
GGAAGCTTCA GGCCCAAAGA TGGGGAACAT 3q 

(117) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 30 base pairs 

CB) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 

GTGGATCCAC CCGCGGAGGA CCCAGGCTAG 3 0 

(118) INFORMATION FOR SEQ ID NO: 117: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1098 base pairs 

(B) TYPE: nucleic acxd 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 

ATGGGGAACA TCACTGCAGA CAACTCCTCG ATGAGCTGTA CCATCGACCA TACCATCCAC 6 0 

CAGACGCTGG CCCCGGTGGT CTATGTTACC GTGCTGGTGG TGGGCTTCCC GGCCAACTGC 12 0 

CTGTCCCTCT ACTTCGGCTA CCTGCAGATC AAGGCCCGGA ACGAGCTGGG CGTGTACCTG 180 

TGCAACCTGA CGGTGGCCGA CCTCTTCTAC ATCTGCTCGC TGCCCTTCTG GCTGCAGTAC 24 0 

GTGCTGCAGC ACGACAACTG GTCTCACGGC GACCTGTCCT GCCAGGTGTG CGGCATCCTC 3 00 

CTGTACGAGA ACATCTACAT CAGCGTGGGC TTCCTCTGCT GCATCTCCGT GG ACCGCTAC 36 0 
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CTGCrTGTGG CCCATCCCTT CCGCTTCCAC CAGTTCCGGA CCCTGAAGGC GGCCGTCGGC 420 

GTCAGCGTGG TCATCTGGGC CAAGGAGCTG CTGACCAGCA TCTACTTCCT GATGCACGAG 48 0 

GAGGTCATCG AGGACGAGAA CCAGCACCGC GTGTGCTTTG AGCACTACCC CATCCAGGCA 54 0 

TGGCAGCGCG CCATCAACTA CTACCGCTTC CTGGTGGGCT TCCTCTTCCC CATCTGCCTG 6 00 

CTGCTGGCGT CCTACCAGGG CATCCTGCGC GCCGTGCGCC GGAGCCACGG CACCCAGAAG 66 0 

AGCCGCAAGG ACCAGATCCA GCGGCTGGTG CTCAGCACCG TGGTCATCTT CCTGGCCTGC 72 0 

TTCCTGCCCT ACCACGTGTT GCTGCTGGTG CGCAGCGTCT GGGAGGCCAG CTGCGACTTC 78 0 

GCCAAGGGCG TTTTCAACGC CTACCACTTC TCCCTCCTGC TCACCAGCTT CAACTGCGTC 84 0 

GCCGACCCCG TGCTCTACTG CTTCGTCAGC GAGACCACCC ACCGGGACCT GGCCCGCCTC 900 

CGCGGGGCCT GCCTGGCCTT CCTCACCTGC TCCAGGACCG GCCGGGCCAG GGAGGCCTAC 96 0 

CCGCTGGGTG CCCCCGAGGC CTCCGGGAAA AGCGGGGCCC AGGGTGAGGA GCCCGAGCTG 102 0 

TTGACCAAGC TCCACCCGGC CTTCCAGACC CCTAACTCGC CAGGGTGGGG CGGQTTCCCC 108 0 

ACGGGCAGGT TGGCCTAG 10 98 
(119) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 365 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 

Met Gly Asn lie Thr Ala Asp Asn Ser Ser Met Ser Cys Thr lie Asp 
15 10 15 

His Thr lie His Gin Thr Leu Ala Pro Val Val Tyr Val Thr Val Leu 
20 25 30 

Val Val Gly Phe Pro Ala Asn Cys Leu Ser Leu lyr Phe Gly Tyr Leu 
35 40 45 

Gin lie Lys Ala Arg Asn Glu Leu Gly Val Tyr Leu Cys Ann Leu Thr 
50 55 60 

Val Ala Asp Leu Phe Tyr lie Cys Ser Leu Pro Phe Trp Leu Gin Tyr 
65 70 75 80 

Val Leu Gin His Asp Asn Trp Ser His Gly Asp Leu Ser Cys Gin Val 

85 90 95 



wo 00/22129 



PCT/US99/23938 



93 

Cys Gly lie Leu Leu Tyr Glu Asn lie Tyr Tie Ser Val Gly Phe Leu 
100 105 110 

Cys Cys lie Ser Val Asp Arg Tyr Leu Ala Val Ala His Pro Phe Arg 
115 120 125 

Phe His Gin Phe Arg Thr Leu Lys Ala Ala Val Gly VaJ Ser Val Val 
130 135 140 

He Trp Ala Lys Glu Leu Leu Thr Ser lie I'yr Phe Leu Met His Glu 
145 150 155 160 

Glu Val He Glu Asp Glu Asn Gin Kis Arg Val Cys Phe Glu His Tyr 
165 170 175 

Pro He Gin Ala Trp Gin Arg Ala He Asn Tyr Tyr Arg Phe Leu Val 
180 laS 190 

Gly Phe Leu Phe Pro He Cys Leu Leu Leu Ala Ser Tyr Gin Gly He 
195 200 205 

Leu Arg Ala Val Arg Arg Ser His Gly Thr Gin Lys Ser Arg lys Asp 
210 215 220 

Gin He Gin Arg Leu Val Leu Ser Thr Val Val He Phe Leu Ala Cys 
225 230 235 240 

Phe Leu Pro Tyr His Val Leu Leu Leu Val Arg Ser Val Trp Glu Ala 
245 250 255 

Ser Cys Asp Phe Ala Lys Gly Val Phe Asn Ala Tyr His Phe Ser Leu 
260 265 270 

Leu Leu Thr Ser Phe Asn Cys Val Ala Asp Pro Val Leu Tyr Cys Phe 
275 280 285 

Val Ser Glu Thr Thr His Arg Asp Leu Ala Arg Leu Arg Gly Ala Cys 
290 295 300 

Leu Ala Phe Leu Thr Cys Ser Arg Thr Gly Arg Ala Arg Glu Ala Tyr 
305 310 315 320 

Pro Leu Gly Ala Pro Glu Ala Ser Gly Lys Ser Gly Ala Gin Gly Glu 
325 330 ' 335 

Glu Pro Glu Leu Leu Thr Lys Leu His Pro Ala The Gin Thr Pro Asn 
340 345 350 

Ser Pro Gly Ser Gly Gly Phe Pro Thr Gly Arg Leu Ala 
355 360 365 

(12 0) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHAPACTERISTICS : 
(A) LENGTH: 26 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ TD NO: 119: 



GACCTCGAGT CCTTCTACAC CTCATC 
(121) INFORMATION FOR SEQ ID NO: 12 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genoinic} 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 



TGCTCTAGAT TCCAGATAGG TGAAAACTTG 3 0 

(122) INFORMATION FOR SEQ TD NO: 121: 



(i) SEQUENCE C1L?^CTERISTI CS : 

(A) LENGTH: 1416 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121; 



ATGGATATTC TTTGTGAAGA AAATACTTCT 
TTAAATGATG ACAACAGGCT CTACAGTAAT 
GATGCATTTA ACTGGACAGT CGACTCTGAA 
CTCTCACCGT CGTGTCTCTC CTTACTTCAT 
ACAGCCGTAG TGATTATTCT AACTATTGCT 
CTAGAGAAAA AGCTGCAGAA TGCCACCAAC 
ATGCTGCTGG GTTTCCTTGT CATGCCCGTG 
TGGCCTCTGC CGAGCAAGCT TTGTGCAGTC 
GCCTCCATCA TGCACCTCTG CGCCATCTCG 
ATCCACCACA GCCGCTTCAA CTCCAGAACT 



TTGAGCTCAA CTACGAACTC CCTAATGCAA 6 0 

GACTTTAACT CCGGAGAAGC TAACACTTCT 12 0 

AATCGAACCA ACCTTTCCTG TGAAGGGTGC 18 0 

CTCCAGGAAA AAAACTGGTC TGCTTTACTG 24 0 

GGAAACATAC TCGTCATCAT GGCAGTGTCC 3 00 

TATTTCCTGA TGTCACTTGC CATAGCTGAT 36 0 

TCCATGTTAA CCATCCTGTA TGGGTACCGG 4 20 

TGGATTTACC TGGACGTGCT CTTCTCCACG 4 80 

CTGGACCGCT ACGTCGCCAT CCAGAATCCC 54 0 

AAGGCATTTC TGAAAATCAT TGCTGTTTGG 6 00 
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ACCATATCAG TAGGTATATC CATGCCAATA CCAGTCTTTG GGCTACAGGA CGATTCGAAG 6 60 

GTCTTTAAGG AGGGGAGTTG CTTACTCGCC GATGATAACT TTGTCCTGAT CGGCTCTTTT 72 0 

GTGTCATTTT TCATTCCCTT AACCATCATG GTGATCACCT ACTTTCTAAC TATCAAGTCA 78 0 

CTCCAGAAAG AAGCTACTTT GTGTGTAAGT GATCTTGGCA CACGGGCCAA ATTAGCTTCT 84 0 

TTCAGCTTCC TCCCTCAGAG TTCTTTGTCT TCAGAAAAGC TCTTCCAGCG GTCGATCCAT 90 0 

AGGGAGCCAG GGTCCTACAC AGGCAGGAGG ACTATGCAGT CCATCAGCAA TGAGCAAAAG 96 0 

GCATGCAAGG TGCTGGGCAT CGTCTTCTTC CTGTTTGTGG TGATGTGGTG CCCTTTCTTC 102 0 

ATCACAAACA TCATGGCCGT CATCTGCAAA GAGTCCTGCA ATGAGGATGT CATTGGGGCC 108 0 

CTGCTCAATG TGTTTGTTTG GATCGGTTAT CTCTCTTCAG CAGTCAACGC ACTAGTCTAC 114 0 

ACACTGTTCA ACAAGACCTA TAGGTCAGCC TTTTCACGGT ATATTCAGTG TCAGTACAAG 12 0 0 

GAAAJVCAAAA AACCATTGCA GTTAATTTTA GTGAACACAA TACCGGCTTT GGCCTACAAG 126 0 

TCTAGCCAAC TTCAAATGGG ACAAAAAAAG AATTCAAAGC AAGATGCCAA GACAACAGAT 132 0 

AATGACTGCT CAATGGTTGC TCTAGGAAAG CAGTATTCTG AAGAGGCTTC TAAAGACAAT 13 8 0 

AGCGACGGAG TGAATGAAAA GGTGAGCTGT GTGTGA 1416 
(12 3) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 71 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 

Met Asp lie Leu Cys Glu Glu Asn Thr Ser Leu Ser Ser Thr Thr Asn 
15 10 15 

Ser Leu Met Gin Leu Asn Asp Asp Asn Arg Leu Tyr Ser Asn Asp Phe 
20 25 30 

Asn Ser Gly Glu Ala Asn Thr Ser Asp Ala Phe Asn Trp Thr Val Asp 
35 40 45 

Ser Glu Asn Arg Thr Asn Leu Ser C>'s Glu Gly Cys Leu Ser Pro Ser 
50 55 60 

Cys Leu Ser Leu Leu His Leu Gin Glu Lys Asn Trp Ser Ala Leu Leu 
65 70 75 80 
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Thr Ala Val Val lie lie Leu Thr He Ala Gly Asn He Leu Val He 
85 90 95 

Met Ala Val Ser Leu Glu Lys Lys Leu Gin Asn Ala Thr Asn Tyr Phe 
100 105 110 

Leu Met Ser Leu Ala He Ala Asp Met Leu Leu Gly Phe Leu Val Met 
115 120 125 

Pro Val Ser Met Leu Thr He Leu Tyr Gly Tyr Arg Trp Pro Leu Pro 
130 135 140 

Ser Lys Leu Cys Ala Val Trp He Tyr Leu Asp Val Leu Phe Ser Thr 
1^0 155 160 

Ala Ser He Met His Leu Cys Ala He Ser Leu Asp Arg Tyr Val Ala 
165 170 175 

He Gin Asn Pro He His His Ser Arg Phe Asn Ser Arg Thr Lys Ala 
leo 165 190 

Phe Leu Lys He He Ala Val Trp Thr He Ser Val Gly He Ser Met 
195 200 205 

Pro He Pro Val Phe Gly Leu Gin Asp Asp Ser Lys Val Phe Lyo Glu 
210 215 220 

Gly Ser Cys Leu Leu Ala Asp Asp Asn Phe Val Leu He Gly Ser Phe 
225 230 235 240 

Val Ser Phe Phe He Pro Leu Thr He Met Val I]e Thr Tyr Phe Leu 
245 250 255 

Thr He Lys Ser Leu Gin Lys Glu Ala Thr Leu Cys Val Ser Asp Leu 
260 265 270 

Gly Thr Arg Ala Lys Leu Ala Ser Phe Ser Phe Leu Pro Gin Ser Ser 
275 280 2B5 

Leu Ser Ser Glu Lys Leu Phe Gin Arg Ser He His Arg Glu Pro Gly 
290 295 300 

Ser Tyr Thr Gly Arg Arg Thr Met Gin Ser He Ser Asn Glu Gin Lys 
305 310 315 320 

Ala Cys Lys Val Leu Gly He Val Phe Phe Leu Phe Val Val Met Trp 
325 330 335 

Cys Pro Phe Phe He Thr Asn He Met Ala Val He Cys Lys Glu Ser 
340 345 350 

Cys Asn Glu Asp Val He Gly Ala Leu Leu Asn Val Phe Val Trp He 
355 360 365 



Gly Tyr Leu Ser Ser Ala Val Asn Pro Leu Val Tyr Thr Leu Phe Asn 
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-^70 375 380 

Lys Thr Tyr Arg Ser Ala Phe Ser Arg Tyr lie Gin Cys Gin Tyr Ly5 
385 390 395 4O0 

Glu Asn Lys Lys Pro Leu Gin Leu lie Leu Val Asn Thr He Pro Ala 
405 410 415 

Leu Ala Tyr Lys Ser Ser Gin Leu Gin Met Gly Gin Lys Lys Asn Ser 
420 425 430 

Lys Gin Asp Ala Lys Thr Thr Asp Asn Asp Cys Ser Met Val Ala Leu 
435 440 445 

Gly Lys Gin Tyr Ser Glu Glu Ala Ser Lys Asp Asn Ser Asp Gly Val 
450 455 460 

Asn Glu Lys Val Ser Cys Val 
465 470 

(124) INFORMATION FOR SEQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 
GACCTCGAGG TTGCTTAAGA CTGAAGC 
(12 5) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124: 
ATTTCTAGAC ATATGTAGCT TGTACCG 
(12 6) INFORMATION FOR SEQ ID NO: 12 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1377 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 

ATGGTGAACC TGAGGAATGC GGTGCATTCA TTCCTTGTGC ACCTAATTGG CCTATTGGTT 6 0 

TGGCAATGTG ATATTTCTGT GAGCCCAGTA GCAGCTATAG TAACTGACAT TTTCAATACC 12 0 

TCCGATGGTG GACGCTTCAA ATTCCCAGAC GGGGTACAAA ACTGGCCAGC ACTTTCAATC ] 80 

GTCATCATAA TAATCATGAC AATAGGTGGC AACATCCTTG TGATCATGGC AGTAAGCATG 24 0 

GAAAAGAAAC TGCACAATGC CACCAATTAC TTCTTAATGT CCCTAGCCAT TGCTGATATG 3 00 

CTAGTGGGAC TACTTGTCAT GCCCCTGTCT CTCCTGGCAA TCCTTTATGA TTATGTCTGG 36 0 

CCACTACCTA GATATTTGTG CCCCGTCTGG ATTTCTTTAG ATGTTTTATT TTCAACAGCG 42 0 

TCCATCATGC ACCTCTGCGC TATATCGCTG GATCGGTATG TAGCAATACG TAATCCTATT 480 

GAGCATAGCC GTTTCAATTC GCGGACTAAG GCCATCATGA AGATTGCTAT TGTTTGGGCA i)4 0 

ATTTCTATAG GTGTATCAGT TCCTATCCCT GTGATTGGAC TGAGGGACGA AGAAAAGGTG bOO 

TTCGTGAACA ACACGACGTG CGTGCTCAAC GACCCAAATT TCGTTCTTAT TGGGTCCTTC 660 

GTAGCTTTCT TCATACCGCT GACGATTATG GTGATTACGT ATTGCCTGAC CATCTACGTT '/2 0 

CTGCGCCGAC AAGCTTTGAT GTTACTGCAC GGCCACACCG AGGAACCGCC TGGACTAAGT "7 8 0 

CTGGATTTCC TGAAGTGCTG CAAGAGGAAT ACGGCCGAGG AAGAGAACTC TGCAAACCCT H4 0 

AACCAAGACC AGAACGCACG CCGAAGAAAG AAGAAGGAGA GACGTCCTAG GGGCACCATG ^*00 

CAGGCTATCA ACAATGAAAG AAAAGCTTCG AAAGTCCTTG GGATTGTTTT CTTTGTGTTT 96 0 

CTGATCATGT GGTGCCCATT TTTCATTACC AATATTCTGT CTGTTCTTTG TGAGAAGTCC 102 0 

TGTAACCAAA AGCTCATGGA AAAGCTTCTG AATGTGTTTG TTTGGATTGG CTATGTTTGT 10 8 0 

TCAGGAATCA ATCCTCTGGT GTATACTCTG TTCAACAAAA TTTACCGAAG GGCATTCTCC 114 0 

AACTATTTGC GTTGCAATTA TAAGGTAGAG AAAAAGCCTC CTGTCAGGCA GATTCCAAGA 12 00 

GTTGCCGCCA CTGCTTTGTC TGGGAGGGAG CTTAATGTTA ACATTTATCG GCATACCAAT 126 0 

GAACCGGTGA TCGAGAAAGC CAGTGACAAT GAGCCCGGTA TAGAGATGCA AGTTGAGAAT 1320 

TTAGAGTTAC CAGTAAATCC CTCCAGTGTG GTTAGCGAAA GGATTAGCAG TGTGTGA 13 77 
(12 7) INFORMATION FOR SEQ ID NO: 126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 458 amino acids 

(B) TYPE: amino acid 
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(C) STRAZ-JDEDrJESS : 

!n) TOPOLOGY not relevant 

(11) KOLECULE TYPE: protein 

(XI i SEQUKNCS DESCRIPTTCX: SEQ ID NO: 126: 

Met Val Asn Leu Arg A^n Ala Vai His Ser Phe Leu Va 1 Hin Leu Tie 
i 5 ]0 IS 

Gly -.eu Leu Val Trp Cys Aup lie Ser Vai Ser PrD Val Ala Ala 

- 2 b 3 0 

lie Val Thr Aap lie Phc Asn Thr Ser Asp Gly Gly Arc? Phe Lys Phe 
35 40 4!:.' 

Pro Anp Gly Val Glu A£i:i Lrp Pro Ala Leu Ser I To Val lie lie Tie 
50 5b .,0 

lie Me*. Thr :le -^ly Gly Asn lie Leu Val Tie 'Aez Ala Val Ser Ke- 
e^^' 70 75 8G 

G.lu Ly.s Lys Leu :iii= Asn Ala Tnr As:i Tyi Phe leu Met Ser Leu Ala 
B 5 9 0 9 

lie Ala Asp Met Leu Va : Gly Leu Leu Val Met Pro Leu Her Leu Leu 
100 lOb 110 

A] a He Leu Tyr Asp Tyr Val Trp Pro Leu Pro Arg Tyr l.eu Cys Pro 
1] 5 12 0 :25 

Val Trp lie Ser Leu Asp l^al Leu Phe Ser Thr Ala Sev He Met His 
130 135 140 

Leu C.yfi Ala He Ser Leu Asp Arg Tyr Val Ala He Ar<.: Ann Pro He 
145 ISO 155 " 160 

Glu His Ser Arg Phe Asn Ser Arg Thr Lys Aia He Met Lys He Ala 
165 170 175 

He Val Trp Ala He Ser He Gly Val Ser Val Pro lie Pro Val He 
180 185 19C 

Gly Leu Arg Asp Glu Glu Lys Val Phe Val Ar,n Asn Thr Thr Cys Val 

'^^^ 200 2C5 

Leu Asn Asp Pro Asn Phe Val Leu He Gly Ser Phe Val /Ma Phe Phe 
210 215 22G 

He Pro Leu Thr He Met Val He Thr Tyr Cys Leu Thr He Tyr Val 

225 22Q 235 240 

Leu Arg Arg Gin Ala Leu Met Leu Leu His Gly His Thr Glu Glu Pro 
245 250 255 
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Frc Gly Leu Ser Leu Anp Phe Leu Lys Cys Cys l.ys Arg Asn Thr Ala 
260 265 2 70 

Glu Liu CUu Asn Ser Ala A.sn Pro Asn Gin A^p Gin Asn Ala Arg Arg 
275 280 285 

Arc Lys Lys Lyr, G.u Arq Arg Pro Arg ^.ly Thr Met Cln A±a He Ai;n 
290 295 300 

Asn c:u At'g Lys Ala Ser Lys Val Leu 'Jly Tic Val Phe Pho Val Plie 
30^. 310 315 320 

Leu He Met Trp Cys Pro Phe Phe He Thr Asn l]e Leu Sei Val Leu 
32 5 33 0 33 S 

Cys Glu Ly^ Ser Cys Asn GIl Lys Leu Mot Glu Ly2 Leu Leu Asn Val 
34C 34£. 35C 

Phe V^a I Trp He Gly Tyr Val Cys Ser Gly He Asn Pro Leu Val Tyr 

3 55 36 0 36 5 

Thr Leu Phe Agh Lys He Tyr Arg Arg Ala The Ser Aiin Tvf Leu Arg 
370 375 380 

Cys Asn Tyr Lys Val Glu Ly£^ Lys Pro Pro Val Arg Gin He Pro Arg 
385 350 3 95 4on 

Val Ala Ala Thr Ala Leu Ser Gly Arg Glu Leu Ayn Val A.sn He T^.-r 
4 C' 5 4 1 C 41 5 

Arg His Thr Asn Glu Pro Val He Glu Lys Ala Sor Asp Asn Glu Pro 
420 42S 4 30 

Gly ^Ic Glu Met Gin Val Glu Asn Lou Olu :..eu Pro Val Asn Pro Ser 

4 3 5 4 4 0 4 4 5 

Sor Val Val Ser Glu Arg Tie Ser Ser Val 
4 5 0 4 5 5 

{12 8) I1C:"T>RMATI0N FOR SEQ ID NO: 127: 

ii) SEQUEKCE CHAl^CTERISTICS : 

(A) LENGTH; 30 base pairs 

(B) TYPE; nucleic acid 
;C) STRANDEDIIESS : siingle 
(D) TOPOLOGY: linear 

Hi) MOLECL'LE TYPE: DNA ^genomic) 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 127; 

GGTAAGCTTG GCAGTCCACC CCAGGCCTTC ■: 



(129) INFORMATION FOR SEQ 10 N0:12P 
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: 1 ) S EO^TEKCF, CHARACTER Zi^.Tl'CS : 
(A) LEJIGTH: ^0 ba.se pans 
;3) TYPE: nucleic acid 

( C ) G TP. AI^IDSDKE S S : s a n g ] e 

(D) TOPOLOGY: linear 

ill) XOLHCITLE TYPE: DNA (genomic) 

[Xij SEgUEMCE DFSCR:?TI0N: SEQ id NO: 128: 
TCCGAATTCT CTGTAGACAC AAGGCTTTCIG 3U 
(l.-JO) INFORMATION FCR SEC' II> NO: 129: 

(i) SEQUENCK CHARACTERISTICS: 

(A) LEKGTIIt [0t;8 base pairs 

(B) TYFE: nucle:c acid 
(0} STRANDEDNE.SS : single 
(D) TOtOLOGY: linear 

(.li; MOLECULE TYPE: DNA (genomic) 

(xi ) SEQIJENCE DESCRIPTIC'M : SEQ IE NO : 1 2 i> : 

ATGGATCAGT TCCCTGA/iTC AGTGACACAA AACTTTGAGT ACGATGATTT GGCTGAGGCC 6 0 

TGITATATTG GCGACATCGT GGTCTTTGGG ACTGTGTTCC TGT::cA':'A^T C^ACTCCGTC 12 0 

AT-'TTTGCCA TTGGCCTGGT GGG.AAATTTG TTGGTAGT :rr TTGCCCTCAC CAACAGCAA.G 18 0 

AAGCCCAAGA GTGTCACCGA CA.TTTACCTC CTGAACCTGG CCTTG-^C^GA TCTGCTGTTT 24 0 

GTAGCCACTT TGCCCTTCTG GACTCAGTAT TTGAT;^AATo A/yVtGl^GCCT CCAC7J^TGCC ^0 0 

ATGTGCAAAC 7CACTACCGC CTTCTTCTTC ATCGGCTTTT TTGG/iAGCAT ATTCTTCATC 3^0 

ACCGTCATCA GCATTGATAG GTACCTGGCC ATCGTCCT:-:G CCGCC.A7vCTC CATGAACAAvC 42 0 

CGGACCGT'3C A(5CATGGCGT C:aC0ATCA3C CTAGGGGTZT GGGCAGCAGC CATTTTGGTG 430 

GCAGCACCCC AGTTCATGTT CACAAAGCAG AAAGAA/^J'iTG AATGCCTTG5 TGACTACCCC 54 0 

GAGGTCCTG': AGGAA^ATC^G GCCCGTGCTC CGCAATGTGG AAACAA^A^^TT TCTTGGCTTC 6 CO 

CTACTCCCCC TGCTCATTAT GAGTTATTGC TACTTCAGAA TCATCCAGA: GCTGTTTTCC 66 0 

TGCAAGAACC ACAAGAAAGC CAAAGCCATT AAACTGATCC TTCTi'^GTGGT 'JATCGTGTTT '/2 0 

TTCCTCTTCT GGACACCCTA C/^CGTTATG ATTTTCCTGG AGAC'JCTTAA GCTCTATGAC 7fi0 

TTCTTTCCCA GTTGTGACAT GAGGAJ^GGAT CTGAGGCTCG CCCTCAGTGT GACTGAGACG 84 0 

GTTGCATTTA GCCATTGTTG CCTGAATGGT CTCATCTATG CATTtGCTGG GGAGAAGTTC 900 

AGAAGATACC TTTACCACCT GTATGGGA^AA TGCCTGGCTG TCCTGTGTGG GCGCTCAGTC 96 0 
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CACGTTGATT TL-TCCTCATC TGAATCACAA AGGAGCAGG ATGGAAGTGT TCTGAGCAGC 102 0 
/^ATTTTAGTT ACCACACGAG TGATGGAGAT GCATTGCTC J TTCTCTGA .lOGB 
(131 ; INFOP.r^/i.TION FOR SEQ ID NO: 130: 

(i; SEOUHNGE CHARACTERISTICS; 

(A) LENGTH: :i5 5 .imincj acids 

(B) TYPE; ammo acid 
iC) STRAITDEDNESS : 

;n) TGPOLOGY . not relevant 

(i:) MOLECULE TYPE: protein 

SEOUEMCF DESCRIPTION: SEQ ID NO 130: 

Met Asp Gin Phe Pro GLu Ser Val Thr G.Uj Asn Phe Glu T^'r Ar.p Asu 
I i;; ^ 1^. ^ 

I,cu Ala C^iu Ala C^^s Tyr lie Gly A.<^p He Val Va^ Pr.t-- Gly Tiir Val 

Phe Leu Ser He Phe Tyr Ser Val He Phe Ala He Gly ^cu Val Gly 
3 5 4 0 .15 

Asn Leu Le\; Vdl Val Phe Ala Leu Thr Asa Ser Lyn Lys Pro Lys Ser 
5 C 5 5 GO 

Val Thr nnp He Tyr Leu Leu Asn Leu Ala Leu ^;ei A^p Lou Lci: Phe 
6 5 7r: 7 5 80 

Va^ Aia Thr Leu Pro Phe Trp Thr His T>'r Leu Ho A = u Giu Lys Glv 
8 5 90 9 5 

Leu Hi3 Ar.Ti Ala Met Cys Lys Phe Thr Thr Ala ?ne Pue Phe He Gly 
ino lOS 110 

Phe Phe Gly Scr He Phe Phe He Thr Val He P^er He Acp Arg Tyr 
115 120 125 ' 

Leu Ala He Val Leu Ala Ala Asn Sei^ Ket Asn Asn Arg Thr Val Gin 
130 135 140 

Hie Gly Val Thr He Ser Leu Gly Val Trp Ala Ala Ala He Leu Val 
145 150 155 160 

Ala Ala Pro Gin Phe Met Phe Thr Lys Gin hyr. Glu Asn Glu Cyr> Leu 
16b 170 17 5 

Gly Asp Tyr Pro Glu Val Leu Gin Glu He Trp Pro Val Leu Arg Asn 
.1 80 185 190 

Val Giu Thr Asn Phe Leu Gly Phe Leu Leu Pro Leu Leu He Met Ser 
135 20 0 205 
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Tyr CyG Tyr Phe Arg T^e Tie Gin Thr Lou Phe Ger Cyc Lyn Asn Hii; 

210 215 220 

hyfi T.ys Ala Lys Ala He Lyr> I,eu He Leu Leu Vnl Val lie Val Phe 

230 2Sb 240 

Phe Leu Pho Trp Thr Pro Tyr Ai3n Val Met Tie Piie LeM Glu Thr Leu 

24 5 2.SC 2 55 

!.ys Leu Tyr Asp Phg Phe Pro Ser Cys Asp Met Arg Lys Asp Leu Ar^ 

26 0 26 5 2 70 

Lou Ala hcu Ser Val Thr Glu Tnr V?\l Ala Phe Ser His Cys Cvs Leu 

275 280 235 

Asn Pro Leu He Ty-r Ala Fr:e Ala Gly Glu Lys Phe Ar^^ Arg T\'r Leu 

2 9 0 2 5 5 10 0 

Tyr His Leu Tyi- Gly Ly^ Cys Leu Ala Val Leu Cyn Gly Arq ^er Val 

3 0^1 31C 3 15 320 

His Val Asp Phe Ser T.er Ser Glu Ser Gin Arg Ser Arg Hxs Gly Ser 

325 330 335 

Val Leu Ger Sei Asn Phe Thr Tyr H: s Thr Sei- Acp Gly Asp Ala Leu 

34 0 34 5 .15 0 



Leu Leu Leu 

3 55 



(13 2) INFORMATION FC*R SEQ ID NO : : 3 1 : 

;i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 32 base pairc 
;3) TYPE: nucleic acid 

(C) ST:iALT.EDNEG5 : Sirqle 

(D) TOPOLOGY: linear 



(li) MOLECULE TYPE: DNA i genomic) 

(xi ) SEQUEIJCE DESCRIPTION: EEQ ID NO : 1 3 i : 



GATCTCCAGT AGGCATAAGT GG A CAATTCT GG 
(13 3) TI:F0RMj\T10N for SEQ ID MO: 132- 



(1 ) G EOUENCE CHARACTER I S T I OS : 
(A) LENGTH: 30 base pairG 
(Bi TYPE: nucleic acid 
(C^ STPALXEDNEGG : single 
(D^ TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 

( X X ) S EOUENCE DESCR I PT I ON : S ID NO : : 3 2 : 
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CTCCTTCGGT CCTCCTATCC TT 3TC7i.G7iAG 

(134) INFORMATION FOR SEO ID NO : 1 3 3 : 

(j) SEQUENCE CHAJ?ACTSRISTICS : 
(a; LENGTH: 3 0 bar>o pairR 

(B) TYPS : nucleic acid 

(C) STP.ANDEDNEGS : single 
;d) TOPOLOGY: iiaedr 

(^i) MOLECULE TYPE: DNA (oenomic) 

(xi) SEQUENCE UESCklPTTOK: SEQ ID NO: 133; 

AGAAGGCCA-fi GATCGCGCGG CTGGCCCTCA 

(135) ItCFORI^ATION FOR SEQ ID NO;134: 

( 1 ) 5 E QUEN C Z C :HARACr ER I S T I C S : 
;a) LENGTH: 3 0 base p.-^irs 

(B) TYPE nucleic acid 

(C) STRATHIIEDNESS : single 

(D) TOPOLOGY: linear 

Cii) MOLECULE TYPE: DNA (genomic^ 

(xi) SEQUENCE DESCRIPTION: SEC ID >:0:13 1: 

CGGCGCCACC CCACGAPJ\r\A GCTCATCTTC 

(13C) INFORMATTCN FC'R SEQ ID KO:l2b: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 3 base pairs 

(E) TYPE: nucleic acid 
(C; STRAMDEDNESS : singic 
(D) TOPOLOGY: linear 

{il} MOLECULE TYPE: DNA (genomic; 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 3 5 : 
GCCAAGAAGC GGGTGAAGTT CCTGGTGGTG GCA 
a 37) INFORMATION FOR SEQ ID NO : 1 3 6 : 

(i) SEQUENCE CILAKACTKRISTICS : 
(A) LENGTH: 30 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQt'ENCE DESCRIPTION: SEQ ID :vO:136. 

CAGGCGGA/.G gtgaaagt::c TGGTCCTCGT 

( 1 3 B ) TH FOR MAT I ON FOR SEQ TD NO : i 3 7 : 

(i; GEOUENCE CHARACTERimCS : 

(A) LENGTFI: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRATTDEDNESS : single 

(D) TOPOLOGY: 1 ineai 

ill) KOLECJLE TYPE: LNA (qenomic) 
fxi) .^EQUEr.'CE DESCRIPTIOW: SEQ ID NO : 1 3 7 : 
CGGCGCCTGC GGGCL\AA3CG GCTGGTGGTG GTG 
(13.9) TNFORMATIOIJ FO'R SEQ TD N0:138: 

ii) SEQUEtCGE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRAlvTDEDNESS : single 

(D) TOPOLOGY: linear 

(i3) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUEIICF DESCRIPTION: SEQ IF :;0_3fa. 

CCAAGCACAA AC:CC;w\GA/'-A GTGACCATCA C 

{140} INFORMATION FOR SEQ TD 10:139: 

{ 1 ) SEQUENCE CHARACTER!. ST ICS : 
(A) LENGTH: 3 0 base pairs 
(R) T"i'PE : nucleic acid 
(C) STRAlsfDEDNESS : single 
{D} TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID XO:139: 

GCGCCGGCGC ACCAAATGCT TGCTGGTG3T 

{141} INFORMATION FC'R SEQ TD NO: 14 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPEr nucleic acid 

(C) STRAJ^JDEDNESS: single 

(D) TOPOLOGY: linear 
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(i-) MOLECULh TYPF: DNA (genomic) 
(xi) SEQUENCK DESCRTPTTON: SEC ID NC:14i; 
CAA/iJ-iAGCTG AAGAAATCTA AGAAGATCAT CTTTATTGTC G 

(142) INFORKJ^TIOrvf FOP £EQ ID iJO:141: 

{ i ) SEQUENCE CHAFJVCTER I : 
(A) LENGTH: 3 0 base pairs 
(R) TYPE: nucleic acid 

(C) STOANriEDNE:.-S : aingle 

(D) TOPOLOGY; linear 

(li) KOLECULE TYPE: DNA (qenomic; 
(xi) GEQUEi:CE DESCRIPTION: SEQ TP N0:i41 
C AAG AC Ci\AG G C AAA ACG CA TG AT CG CCAT 

(143) INFORT-TATIOU FOR SKQ ID NO: 14 2: 

(1) SEQUENCE CHARACTER I F TT CS : 
(A) LENGTH: 3 0 base pair." 
IB) TYPE: nucleic acid 
;C) STFAJJDFDNKSS : Single 
(D) TOPOLOGY: linear 

fii) MOLECt;LE TYPE: DNA ((genomic) 

(KZ.) SEQUENCE DESCP.IPTICN : SEQ ID NO: 112 

GTCAAGGAGA AGTCC/-AAAG GATCATCATC 

(144) INFORMATION FOR SEO ID r':0:143: 

(i) SEQUENCE CHARACTERISTICS: 
[A) LENGTH: 30 ba^e pair?? 
{D) TYPE: nucleic acid 

(C) STPAND3DNESS : ni.ngle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ TD NO;143 
CGCCGCGTGC GGGCCAAGCA GCTCCTGCTC 

(145) INFORMATION FOR SEQ ID N0;144; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 bafie pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(^i) MOLECULE TYPE; DNA {genomic; 
(xi) ^,F,Ql]Ei:CF. DESCRIPTION: SEQ ID NO; 1^.4: 
CCTGATAACC: (jCTAT/'iAA.AT GGTCCTGTTT CGA 
(1-16) INFORf-lATION FOR SEQ TD NO: 145: 

(i) SEQUENCE CliARACTERISTICS : 

(A) LENGTH: 3 6 base pairs 

(B) TYF>F. : nuclei c acid 

(C) P^TP.M^'HDTT.NESE: : single 
{ C } TOPOLOGY : 1 1 nea r 

(i:) MOLECULE TYPE; DNA (genomic) 

(x:} SEOLTET'ICF DESCRIPTION: SEQ Id KG:! -lb: 

GAAAGAC/J^iA AGAGACTCAA GAGGATGTCT TTATTG 

(147) IKFORM/vTlON FUR £EQ ID NO:146: 

( i ) S EQUEKCE ( :ii/-J^CTERI S T 1 CS : 
(A) LENGTH: 33 base pairs 
(R) TYPE: nucleic acid 
( C } S IM^ANI^ EDNE C 51 : single 

( D ) TOPOLOGY : 1 mea r 

(-Li) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 

CGGAGAAAGA GCGTGAAACG CACAGCCATC GCC 

(14 8) INFORKATI':)N FOR SEQ IH NO: 14 7: 

ii'i SEQUENCP: CHARACTERISTICS: 
(A) LENGTH: 3 0 b'A^e pairs 
(B! T^fPE: nucleic acid 

(C) STRANDEDNEPS : Single 

(D) TOPOLOGY: linear 

:ii} MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 
AAGCTTCAGC GGGCCAAGGC ACTGGTCACC 
(149) INFCRrj^TION FOR SEQ ID NO: 148: 

(:) SEQUENCE CHAi^CTERISTICS : 

(A) LENGTH: 30 base pair.q 

(B) TYPE: nucleic acid 

(C) STRAKDEDKESS : single 
(d; TOPOLOGY; linear 
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(li) MOLECULE TYPE: CKA (f^enomic) 
(xj.) SEgUEXCE DESCRIPTION : ;"EQ ID N0:x^l8: 
CAGCGGCAGA AGGC.VuW\G GGTGGCCATC 3 0 

(150) IKFORMAT:ON for SEQ id no : 14 9: 

( i ) SEQUENCE ClIARACTER I ST ICS : 
(7a) LENGTH: 3 0 base pairs 
(3) TYPE; nucleic acid 

(C) STRAITOEDNESS : single 

(D) TOPOLOGY- linear 

(ii) MOLECULE TYPE: DNA (v^enomic) 
(xi) SEQUENCE DESCR IPTIOr: : SEQ ID NO: 149: 
CGGCAGA/iGG CGAAGCGCAT GATCCTCGCO 3 0 

(151) INFORRT^TION FOR GEQ ID N0:1£)C: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: base pairs 
(3) TYPE: nucleic acid 
{ C ) S TRTU^JDEDNE S S r n 1 ng J e 
(D) TOPOLOGY, linear 

MOLECULE TYPE: DNA ■(genomic: 

(xi) SEQUENCE DE;":CRIPTTON : SEQ ID NO;lbO: 

GAGCGCAACA AGGCCAAA-AA GGTGATCATC 3 0 

(152) INFORMATION FOR SSQ ID NO: 151: 

(i) SEQUENCE CIIAFACTEF.I STICS : 
(A) LENGTH: 3 9 base parrs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : £iingle 

(D) TOPOLOGY, linear 

(ii; MOLECULE TYPE: DNA {cjenomicl 
(xi) SEQUTINCE DESCRIPTION: SEQ 10 N0:15i: 
GGTGTAAACA AAAAGGCT/lA /JVACACAATT ATTCTTATT 39 

(153) INFORMATION FOR SEQ ID N0;152: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucJoic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(i^) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCR I PTIGIJ : SEC "3 NO: 132: 
GAGAGCCAGC TCAAGAGCAC CGTGGTG 
(lb4) INFORMATIC!J FOH SEQ ID NO: 153: 

(i; SEQUENCE CHARACTER T STI C£ : 

(A) LENGTH: 3 0 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

Uil MOLECULE TYPE: DNA (gencmicl 
(x^) SEQUENCE DESCRIPTION: SEQ ID NO:L53: 
CCACAACCAA ACCAAGAAAA TGCTGGCTGT 
\155) INFORMATION FOR SEQ ID NO: 154: 

ix] SEQLWCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) GTRAKDEDNESS : ?^ ingle 

(E) TOPOLOGY: linear 

{ i 1 ) MOLECULE TYPE : DNA { gGnom i c } 
(xi) SEQUEI'JCE DESCRIPTION: SEQ ID KO-IB;: 
CATCAAGTGT ATCATGTGCC AAGTACG CCC 

(156) INFORMATION FOR SEQ ID N0:1S5: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 4 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: 
CTAGAGAGTC AGATGAAGTG TACAGTAGTG GCAC 

(157) INFORMATION FOR SEQ ID NQ:156: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairn 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPh:: DNA (genomic) 

(xi) SEQlTErirV. DESCRIPTION: SKC TO N0:lb6 

CGGACAAAAG TG/iAAACTAA AAAGATGTTC CTCATT 

( 1 5 & ; I NFORT4AT I ON F<3R S EQ ID NO : 1 5 7 : 

[i) SEQUENCE CHARACTERISTICS: 
!A,) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 

( C ) S TR/O^DEDNE S S : s i r.g 1 e 

(D) TOPOLOG'/: linear 

(li) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUI:NCE DESCRIPTION: SEQ ID >J0:_b7: 
GCTGAGGTTC GCAATAAACT AACCATGTTT GTG 
(159) INFORMATION FOR SEQ ID NO: 158: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTJT: 23 base pairn 

(B) TYPK: nut:leic acid 

(C) STRAI^^EDMESS : tiingie 

(D) TOPOLOGY: linear 

( 1 .i ; MOLECULE TYPE : DXA ( genomi c ) 
(xi) SEQUENCE DEiJCRIPTION : SEQ ID NO: 158: 
GGGAGGCCGA GCTGAAAGCC ACCCTGCTC 
(16 0) INFORMATION FOR SEQ ID NO : 1 ^ 9 : 

(i) SEQUENCE CKATACTERISTICS : 

(A) LENGTH: 1 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYI-S: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ IP NO : 1 9 
CAAGATCAAG AGAGCCAAAJ^ CCTTCATCAT G 
(161) INFORMATION FOR SRQ ID NO: 160: 

(i) SEQUENCE CirLAPJ\CTERISTTCS : 

(A) LENGTH: 31 banc pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDrrESG : single 

(D) TOPOLOGY: linear 
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ir.r] MO:.EC'JLE TYPE: DNA (genomic) 
(xi) SEQITEKCE DESCRIPTICK : SEQ TD NO:^oO: 
CCGCAGACA/^. GTG7iAG7iAGA TGCTGTTTGT C 3] 

(162) INFORMATION FOR SEQ ID NO: 161: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

( C } STRANDEDNES G : s i nq 1 e 
;d) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ IB NO : : CI : 

GCAAGGACCA GATCAAGCGC CTGGTGCTCA 3q 

(163) TN?OPr4ATION FOR ShJQ ID Uj:162: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 ha^e pairs 

(B) TY^PE: nucleic acid 

(C) STRANEEBNESS : single 

(D) TOPOLOGY: linear 

(1.1 } MOLECULE TYPE: BNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID KO:iC2: 
CAAGAAAGCC AAAGCCAAGA AACTGATCCT TCTG 34 
{164} INFORMATION FOR SEQ ID NO riff 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1068 base pairs 

(B) TYPE; nucleic acid 
iC) STPJ^DEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (gcnoTnic) 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO: 16 3: 

ATGGAAG?.TT TGGAGGAA-AC ATTATTTGA/v GAATTTGAAA ACTATTCCTA TGACCTAGAC: 6 0 

TATTACTCTC TGGAGTCTGA TTTGGAGGAG AJ\AGTCCAGC TGGGAGTTGT TCACTGGGTC 120 

TCCCTGGTGT TATATTGTTT GGCTITTGTT CTGGGTUVTTC CAGGAAATGC CATCGTCATT 18 0 

TGGTTCACGG GGCTCAAGTG GAAGAAGACA GTCACCACTC TGTGGTTCCT C7VATCTAGCC 24 0 

ATTGCGGATT TCATTTTTCT TCTCTTTCTG CCCCTGTACA TCTCCTAl'GT GGCCATGAAT 300 
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TTCCACTGOr CCTTTGGCAT C'rGGCTGTGC AAAGCCVa'T CCTTr^ACTGC CC7iGTTGA/LG 360 

ATGTTTGCCA GTGTTTTTTT CCTGACAGTG ATCAC^CTO:^ AGCAGTATA:' GCACTTGATC 42 0 

CATCCTGTC" TATCTCATCG GCATGGAACC CTCAAGAACT CTCTGATTGT GAlTArATTC 4 80 

ATCTGGGTTT TGGCTTC^CT AATTGGCGGT CCTGCCCT2T AGTTCCGOGA CACTGTGGAG 34 0 

TTC/iATAA'l'C ATACTCTTTG ^TJiTAACAAT TTTCA(4AAGC ATGATCCTGA CCTGACTTTG 600 

ATCAGGCAJC ATGTTCTGAG TTGGGTGAAA TTTATGATTG GCTATCTCTT CC2TTTCCTA b60 

ACAATGAGTA TTTGCTAGTT GTGT^JTGATC TTCAA.GGT<JA AGAAGCGAAC A.GTCCTGATC 7 20 

TCCAGTAGGC ATAAGTGGAC AA.TTC7GGTT GTGGTTGTG<^ C'rTTTGTGGT TTGCTGGACT 7RC1 

CCTTAxTCAGC TGTTTAGGAT TTGGGAGGTG ACCATTGAGC AGA.A7'AGGTA TTCGCACCAT B4 0 

GTGATGCA.GG GTGG/iATCCC CCTCTCCAC^T GGTTTGGCAT TCGTCAATAG TTGCTTGAA.G 9 00 

CCCATCCTTT ATGTCGTA.Vr TAGTA;^GA„AG TTCC?J^,GCTG Gg^'I TCCGGTG CTCAGTTGC'l' 96 0 

GAGATACTCA AGTACACAGT GTGGGAAGTG AGCTGTTGTG il^GAGAGTGAG tgaAGAGCTG 1020 

AGGAACTGAG /vAACCAACAA TCTGTGTGTG CTGGAA/iCAC^ i'^CT^JXTAA .;[)6e 
(16 5} INFORI-tATlON VOR SEQ ID NO ; 1 6 4 : 

:i- SEQUENCE CHAl^LACTERISTICS : 

(A) LENGTH: 3 55 amino acidn 

(B) TYPE: amino acid 

(C) STRANDEDriESS : 

(D) TOPOLOGY not relevant 

■i:^) KOLECIJLE TYPE; pi'otein 

tXi) SEOTIENCK DEf^CI^IPTIGN : SEQ ID ITO:164: 
Met Glu Ar.p Len Giu Glu Thr Leu Phe c;la Glu Phe GIl. A^n Tyr Ser 

Tyr Asp Leu Asp Tyr Tyr Ser Leu Glu r.er Ai:p Leu Glu Glu Lys Val 
20 2 5 30 

Gin Leu Gly Val Vc-.l His Tip Val Ser Leu Val Leu Tyx' Cyr. Leu Ala 



Phe Val Leu Gly lie Pro (]ly Asn Ala lie Val lie Trj; Phe Thr Gly 

50 55 60 

Leu Lys Trp Lys Lys 1 hr Val Thr Thr Leu Trp Phe Leu Asn Leu Ala 

65 70 75 80 

He Ala Agp Phe He Phe Leu Leu Phe Leu Pro Leu Tyr He Ser Tyr 
85 1^0 95 
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Val Ala MeL Asii Phe His Trp Pro Phe Gly lie Trp Le\; Cys T..yG Ala 
J 0 0 10 5 110 

Agh Ser Phe Thx Ala Gin Leu Asn Met Phe Ala Ser Val Phe Phe Leu 
lib 1"0 12 5 

Thr Val Tie r^er I.oii Anp His Tyr lie Hi:^ Leu lie His Pro Val Leu 
130 13 B 14 C 

Ser His Arg Hig Arg Thr Leu liys Asn Ser Leu Tie Val Tie lie Phe 
145 150 IBS 160 

lie Trp Leu Leu Ala Ser Leu He Gly Giy Pro Ala Lou Tyr Phe Arg 
16 5 1 (J 17 5 

At^p Thr Val Gli: Phe Asn Acn Hin Thr Leu Gys 'lyr Ann Agh Phe Gin 
18 0 185 190 

Lys HiG Asp Pro Acp Leu Thr Leu He Arc^ His ilis Val Leu Thr Trp 

19S 200 205 

Val Lys Phe IJe He Gly Tyr Leu Phe Pro Leu Lou Thr Met Ser He 
210 215 22{j 

Cys Tyr I^eu Cys Leu He Phe Lys Val Lys Lys Ai>: Val Leu He 

225 230 240 

Ser Ser Arg His Lys Trp Thr He Leu /aJ Val Val Val Ala Phe Val 
2 4 5 ..: 5 2 5 5 

Val Cyr. Trp Thr Pro Tyr His Leu Phe .'^er Tie Trp ::;iu Leu Thr He 
2G0 26 5 2 70 

His Hie Asn Ser Tyr Ser H.is His Val Met Gin Ala Gly He Pro Leu 
2 7 5 2 B 0 2 S 5 

Ser Thr Gly !.eu Ala Phe Leu Asn Ser '.'ys Leu Al^h Pro He Leu Tyr 
290 295 303 

Val Leu He Scr Lyt; Lys Phe Gin Ala Ara Phe Arg Ser Ser Val Ala 
305 310 315 320 

Glu lie Leu Lys Tyr Thr Leu Trp G]u Val Ser Cys Ser Gly Thr Val 
325 33 0 335 

Ser Glu Gin Leu Arg Asn Ser Glu Thr Lys Asn Leu Cys Leu Leu Glu 
340 34 5 350 

Thr .Ma Gin 
355 

(16 6) INFORMATION FOR SEQ ID NO: 165: 



(r) SEQUENCE CILARACTERTSTICS : 

(A) LENGTH: 1089 base pairs 
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(B) TYPE: nucleic acid 
{ C) S TRjV^DEDN ESS: Single 
(D) TOPOLC'GY: linear 

(ii! MOLECULr TYPK: DNA (genomic? 

(xi) SEQUENCE DESCRIPTION: SKQ ID NC':165; 

ATGGGCAACC ACACGTGGGA GGGCTT^CCAC GTGGACT'-::C C::GTGGACCA CCTCTTTCGG GO 

CCATCCCTCT ACATCTTTGT CATCGGCGTG GGGCTGCCLYi CGAACTGCCT GGCTCTGTGG 12 0 

GCGGCCTACC GCCAGGTGCA ACAGCGCAAC GA:^CTGGGCG TCTACCTGAT GAACCTCAG'J 1.^0 

ATGGCCGACC TGCTGTACAT CTGCACGCTG CC3CTGTGGG TGGACTACTT CCTGCACCAC 2 4i; 

GACAA.CTGGA TCCACGGCCC CGGGTCCTGC GGTTCATCTT CTACAC27V7.T 300 

ATCTACATCA GCATCGCCTl CCTGTi^CTG:- ATCTCGGT :g AGGGCTACCT GGCTGTGGCC 3 -j 0 

GAGCCACTiX: GCTTCGCCCG CCTGCCCCGG GTCAAGACX- CCGTGGCCGT GAGCTCCG'l'.^ 420 

GTGTGGGCCA CGGAGCTGGG CGCCA.M:TC;.- G'^^CCCGT'^.-:^ TCCATGACGA GCTCTTCCGA 4S0 

GACCGCTA-A ACCACACCTT CTGCTTTGAG AAGTTCGC'JA TGGAA3GCTG GGTGGCCTGG 5 4C: 

ATGA/.CCTGT ATCGGGTGTT CGT^GGGGTTC CTCTT G ^ ^l^T G ^G :G :TCAT GCTGCTi.TG'i 60;; 

TACCGGGGCA TCCTGCGGGG CGTGCGGGGG A.^GGTGT^^GA CGGAGGGCCA GGAG.AAGGGG 6t0 

AAGATCGCGC GGCTGGCCCT CAGGC-rCATr G^GATCG'TGC TGGTCTGCTT TGCGCCCTAT 7::0 

CACGTGCTCT TGCTGTCCCG CAGCGCCATC TACCTGGGGC GCCOGTGGGA CTGCGGCTT'J 78 0 

GAGGAGCGCG TCTTTTCTGC ATACCACAGC T'JACTGGCTT TGACCAGCCT CAACTGTGTG 8 4 0 

GCGGACCCCA TCCTCTACTG CCTGGTCAAC GAGGGCGCCG GCAGO^Al'GT GGGCAAGGCG 90 C 

CTGCACAACC TGCTCCGCTT TCTGGGCAGC GA:;iJ\GCCCC AGGAGATGGC CAJVTGCCTCG 96 0 

CTCACCCTGG AGACCCCACT CACCTCCAAG, AGGAACAGCA CAGCCAAAGC CATGACTGGC 102 0 

AGCTGGGCCG CCACTCCGCC ■rTCCCAGGG{; GACCAGGTGC AGCTGAAGAT GCTGCCGCCA 1080 

GCACAATGA 108 9 

(167) IIJFORMATION FOR SEQ ID NO ISC: 

(i) SEQUENCE CHARACTERISTICS. 

(A) LENGTH: 362 am±no acids 

(D) TYPE: amino acid 

iC) STRANDEDNESS : 

(D) TOPOLOGY: not: ralevanL 

{-i} MOLECULE TYPE: Drotem 
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(xi) SEQUENCK DE.'^.CRT PTION : GEQ IH NO:lbG: 

>^et Giy Asn His Thr Trp G:u C^ly Cys Hz s Va i Asp Ser Arg Vai Asp 

1 S 10 

His Leu Phe Pro Pro Sez Leu Tyr lie Phe Val lie Gly Vs."! Cly Leu 
20 2S 30 

Pro Thr Asn Cys Leu Ala Leu Trp Ala Ala Tyr A.rg Glu Val Gin Glii 
3 5 4 0 4 5 

Arg Asn Glu Leu Gly Val Tyr Leu Met Ann Lrm Ser He Ala Asp Leu 

50 Sb 6r: 

Leu Tyr He Cys Thr Leu Pro Leu Trp Val A£;p lyr Phe Leu Hir> Hin 
6 5 70 8 0 

Anp Acn Trp lla Hic Gly Pro Gly Scr Cyn Lyn Lr.u Phe Giy Phe lie 

B b ^ (J 9 5 

Phe Tyr Thr Asn lie Tyr He Ser lie Ala Plie Leu Syo Cyn I'e Ser 
100 105 ilO 

Val Asp Arg Tyr Leu Ala Val Ala Hi^ Pro Lou Arq Phe Ala Arg Leu 
115 12 0 L2S 

Arg Arg Val Lys Thr Ala Val Ala Val Ser Ser Val Val Irp Ala Thr 
130 135 14 0 

Glu Leu Gly Ala A^;n Scr Ala Pro Leu Phe His Asp Glu Leu Phe Arg 
145 ISO 1-^)5 160 

Asp Arg T\a- A.^^n His Thr Phe Cyc Phe Glu Lys Phe Pro Xet g:u Gly 
165 170 175 

Trp Val Ala Trp Met hsn Leu Tyr Arg Val Phe Val '^ly Phe Leu Phe 
IRO 185 190 

Pro Trp Ala Leii Met Leu Leu Ser Tyr Arg Gly He Leu Arg Ala Val 
195 200 205 

Arg Gly Ser Val Ser Thr Glu Arg Gin Glu Lys Ala Lys lie Ala Arg 
210 215 220 

Leu Ala Leu Ser Lou Ho A^a Ho Val Leu Val Cys Phe Ala Pro Tyr 
22 5 23 0 23 5 24 0 

His Val Leu Leu Leu Ser Arg Scr Ala Tie T>/r IjCu 31y A.rg Pro Trp 



Asp Cys Gly Phe Glu G.^u Arg Val Phe Ser Ala Tyr His Ser Ser Leu 

260 265 27C 



Ala Phe Thr Ser Leu Asn Cy^ Val Al^ Asp Pro lie Leu Tyr Cys Leu 
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:^75 280 23S 

Vdl Arn Glu Gly Ala Ary Sor Anp Va I Ala Lyr> Ala Leu His Asn Leu 
^90 295 .^{^o 

Leu Arg I'he Leu Ala Ger Asp Lys i^ro Gin Glu Mat Ala Asn Ala Ser 
305 310 3 IE J2C 

Lgu Thr Leu Glu Thr Pro Leu Thr Ser Lyr. Arg Asn Ser Thr Ala Lyi3 
325 330 335 

Ala Mnt Thr Cly Ser Trp Ala Ala Thr Pre Pre Ser Gin Gly Asp Gin 
340 345 350 

Val Gin Leu Lys Kct Leu Pro Pro Ala Gin 
355 350 

(16 8} INFORMATION FOR SEQ ID NO; 167: 

(1) SEQUENCL CHARACTERISTICS; 

(A) LENGTH; 10 02 base pairs 
(R; TYFK : nucleic acid 
(C; STRAITDEDKESS : single 
(D) TOPOLOGY: linear 

(ii; MOLECULE TYPE: DNA (qeno-nic; 

(xi) SEQUENCE DESCRIPTION: S^Q IP N0:lf:7r 

ATGGAGTCCT CAGGCAACCC AGAGAGCACC ACCTTTTT IT ACTATGACCT TCAGAGCCAG 6 0 

CCGTGTGAGA ACCAGGCCTG GGTCTTTGCT ACCCTCGCCA CCACTGTCCT GTACTGCCTG 120 

GTGTTTCTOC TCAGCCTAGT GGGCAACAGC CTGGTCCTGT IGGT :CTGGT G-V^GTATGAG 18 D 

AGCCTGGAGT CCCTCACCAA CATCTTCATC CTCAAC'JTGT GCCT TTCAGA CCTGGTGTTC 24 0 

GCCTGCTTGT TGCCTGTGTG GATCTCCCCA TACCACTGGG GCTGGGTGCT GGGAGACTTC 3 00 

CTCTGCAAAC TCCTCAATAT GATCTTCTCC ATCAGCCTCT ACAGCAGCAT CTTCTTCCTG 360 

ACCATCATGA CCATCCACCG CTACCTGTCG GTAGTGAGCC CCCTCTCCAC CCTGCGCGTC 42 0 

CCCACCCTCC GCTGCCGGGT GCTGGTGACC ATGGCTGTGT GGGTAGCCAG CATCCTGTCC 4 BO 

TCCATCCTCG ACACCATCTT CCACAAGGTG CTTTCTTCGG GCTGTGATTA TTCCGAACTC 54 0 

ACGTGGTACC TCACCTCC3T CTACCAGCAC AACCTCTTCT TCCTGCTGTC CCTGGGGATT 600 

ATCCTGTTCT GCTACGTGGA GATCCTCAGG ACCCTGTTCC GCTCACGCTC CAAGCGGCGC 660 

CACCGCACGA AAAAGCTCA-T^ CTTCGCCATC GTGGTGGCCT ACTTCCTCAG CTGGGGTCCC 72 0 

TACAACTTCA CCCTGTTTCT GCAGACGCTG TTTCGGACCC AGATCATCCG GAGCTGCGAG 78 0 
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GCCAAACAGC AGCTAGAATA CGCCCTGCTC ATCTGCCG:.-A ACCTCGCCTT CTCCCACTGG fi4 0 

TGCTTTAACC CGGTGCTCTA TGTCTTCGT^G GGGG^CAAGT "CCGCACACA GCTG/i/iACAI' 9;:0 

GTTCTCCGGC AGTTCTGGTT CTGCCGGCTG CAGGCACCCA GCCCAGCCTC GATCCCCCAC 960 

TCCCCTGGTG CCTTCGCCTA TGAGGGCGCC TCCTTCTACT GA 10 02 
U69) INFORMATION FOR SEQ ID NO :16a: 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 amino acid^ 

(B) TYPE: amino acid 
iC) STRANDFDNESS : 

(n) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

iy.i) SEQUENCE DESCRIPTION: SEQ 11^ N'J:16B: 

M.^t Glu Sor Sor Gly Asn Pro Glu Ser Tin Thr Phe Phc Tyr Tyr Asp 

1 5 10 15 

Leu Gin Ser Gin Pro Cyi; Glu Acn G.ln Ala Trp Val Phe Ala Thr Leu 
2 0 2 5 3 0 

A. a Thr Thr Va- Leu Tyr Cys Leu Va!. i'::e Leu Leu Ser Lou Val Gly 
3 5 4 0 4b 

ALjr. Ser Leu Val Leu Trp Val Leu Val : y? Tyr Glu Ser Leu Glu Ser 

50 5 5 f,Ci 

Leu Thr Asn lie Phe lie Leu Asn i^eu Cys Leu Sor A^p Leu Val Phe 

6S 7 0 v:.. 8C 

Ala Cys Leu Leu Pro Val Trp Tie Ser Pro Tyr His Trp Gly Trp Val 
8 5 90 9b 

Leu Gly Asp Phe Leu Cys Lys Leu Leu Asn Met lie Phe Ser lie Ser 
100 105 110 

Leu Tyr Ser Ser He Phe Phe Leu Thr lie Mot Thr lie His Arg Tyr 
115 120 125 

L*^u Ser Val Val Ser Pro Leu Ser Thr Leu Arg Val Pro Thr Leu Arg 
130 135 140 

Cys Arg Val Leu Val Thr Met Ala Val Trp Val Ala Ser Tie Leu Ser 
14 5 150 155 160 

Ser lie Leu Aijp Thr He Phe His Lys Va.l Leu Ser Ser Gly Cys Asp 
165 170 175 

Tyr Ser Glu Leu Thr Trp Tyr Leu Thr Ser Val Tyr Gin His Asn Leu 
18 0 18 5 19 0 
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Phc P}ie Leu Leu Ser Leu Gl y 1-e lie Leu Phc Cyn Tyr Va] 01 u lie 

195 20C 20^ 

Lei.1 I\rg Thr Leu Fhe Arg F,e.r Arg Sor Lys Arc Arc] llio Ary Thr Lys 

210 21 S 22 0 

I.ys Leu lie Phe Ala J 1 p. Val Val Ala Tyr Phc Leu r,er Ttp (31v Pro 

22 S 230 235 240 

Tyr Asn Phe Thr Leu Phe Leu GJn Thr Leu Phe Anj Thr Gin He He 

2 15 250 2b:3 

Arg Scr Cys Glu Ala J.yn Gin Gin Leu Glu Tyr Ala Le^.: Leu lie Cys 



Arq Ayii Leu Ala Phe Ger Kis Cys Cys Phe Asn Pro Val L--u Ty^" Val 
275 280 2Gj 

Phe Val Giv Val Lys Phe Arg Thr Hi r, Leu Lys Hi :3 Val Leu Arq Gli: 
2 9 0 2 9 S 3 0 1^ 

Phe Trp Phc Cyi: Arg Leu Gin Ala Pro ber Pio Ala :^cr lie Pro H:n 
305 310 315 32 C 

Ser Pro Giy Ala Phe ALa Tyr G]u Gly Ala .^er Phe Tyr 
325 3 30 

(17 0) INFORMATION FOR 3HQ ID NO:1^j9: 

(i) J^EQUEKCr: CH7iRACTEKISTICS : 

(A) LENGTH; 58 7 base pairs 

(B) TYP:l : nucleic acid 

(C) STR/VlJDEDNESn : Single 

(D) TOPOLOGY: linear 

(ii) MOLECUI.E TYPE: DNA (geiiornicM 

(xi) SEQUENCE DItSCR IPTIDK : SEQ ID NC:L69; 

ATGGACAACG CCTCGTTCTC GGAGCCCTQl CCCGCCAACG CA7CGGGCCC dl^ACCCGGCG 6 0 

CTGAGCTGCT CCAACGCGTG GACTCTGGCG CCGCTGCCGG CGCCGCTGGC GGT5GCTGTA 12 0 

CCAGTTGTCT ACGCGGTC^Al CTGCGCCl^TG GGTCTGGCGG GCAACTC(::i3C C-3TGCTGTAr 180 

GTGTTGCTGC GGGCGCCCCG CATG/vAGACC GTCACCA/vCC TGTTCATCCT C/iACCTGG'2C 24 0 

ATCGCCGACG AGCTCTTCAC GCTGGTGCTG CCCATGAACA TCGCCGACTT CCTGCTGCGG 30 0 

CAGTGGCCCT TCGGGGAGCT CATGTGGAA3 CTCATCGTGG CTATCGA'::rA GTACAAGACC 36 0 

TTCTCCAGCC TCTACTTCCT CACCGTCATG AGCGCCGACC G:Ti\CCTGGT GGTGTTGGCC 4 2 0 

ACTGCGGAGT CGCGCCGGGT GGCCGGCCGC ACCTACAGCG CCGCGCGCG2 GGTGAGCCTG 48 0 
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GCCGTGTGGCJ GGATL'GTCAC ACTCGTCGTG CTGCCCTT'r^G CAGTCTTCGC ITCGGGTAGAC 5^.0 

GACGAGCAGG GCCGGCGCCA GTGCGTGCTA GTCTTTCCGC ACCCCGAGGC CTTCTGGTGG 60 0 

CGCGCGAGCC GCCTCTACAC GCTCGTGCTG GGCTTCGCCA TCCCCCTGTC CACCATCTG T 660 

GTCCTCTATA CCAGCCTGCT GTGCCGGCTG CATGCCATGC GGCTGGACAG CCACGCCAAG 72 0 

GCCCTGGAGC GCGGCAAGAA GCGGGTGAAG TTCCTGGTi'^G TGGC/ JVTCCT GGCGGTGTGC 78 0 

CTCCTCTGCT GGACGGCCTA CCACCTGAGC ACCGTGGTGG CGCTCACCAC CGAGGTCGCG 84 0 

CAGACGCCGC TGGTGATCGC TATCTCCTAC TTCATCACCA GCCTGACGTA CG CCAACAGC 900 

TGCCTGA.AGC .^CTTCCTCTA CGCCTTCCTG GACGCCAGCT TC':GGAGGAA CCTCCGGCAG 96 C 

CTGATAAGTT !3CCG::GCGGG AGCCTGA 9S7 
(171; INFORT^TTGN FOR SEQ TD MO: 170: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 8 amino acidn 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : 

(DJ TOPOLOGY: net relevant 

(i^) MOLECULE TYPE: protein 

(xi) SEQU>:NCE DESCRIPTION: SEQ ID NO : 1 7 C : 

Met Asp Asii A-la Ser Phe Ser Glu Pro Trp Pro Ala Asa A.la Sex' Gly 
1 3 10 15 

Pro Asp Pro A.la Leu Ser Cys Ser Asn Ala Ser Thr Leu Ala Pro Leu 
2C 2d 30 

Pro Ala Pro Leu Ala Val Ala Val Pro Val Val Tyr Ala Val lie Cys 
35 40 45 

Ala Val Gly Leu Ala Gly Asn Ser Ala Val Leu Tyr Val Leu Leu Arg 
50 55 60 

Ala Pro Arg Met Lys Thr Val Thr Asn Leu Phe lie Leu Acn Leu Ala 
65 70 75 80 

Ije Ala Asp Glu Leu Phe Thr Leu Val Leu Pro lie Asn lie Ala Asp 
85 90 9b 

Phe Leu Leu Arg Gin Trp Pro Phe Gly Glu Leu Met Cys Lys Le^j lie 
100 105 110 

Val Ala lie Asp Gin Tyr Asn Thr Phe Ser Ser Leu Tyr Phe Leu Thr 
115 120 125 

Val Met Ser Ala Asp Arg Tyr Leu Val Val Leu Ala Thr Ala Glu Ser 
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-30 13S i^O 

Arg Arg Va] Ala Gly Arg Tnr Tyr i^ei A.l n Al« Arq Ala Val Ser Leu 
-50 IBS 16U 

Ala Val Trp Giy lie Val Thr I.eu V.il Val I.eij: Pre Phe Ala Vr. 1 Phe 

170 175 

Ala Arg Leu Asp Ar^p Glu Gin Gly Arg Arg Gin Cvg Val Leu Val Pho 
180 IBS ' 19C 

Pro Gin Pro Clu Ala Phe Trp Trp Arg Ala £cr Arg Leu Tyr Thr Leu 
19B .7 00 20 5 

Val Leu Gly Phe Ala I]e Pro Val Scr Thr lie Cyr, Val Leu 'lyr Thr 
2'^0 2iS 220 

Thr Leu Leu Cys Arg Leu His Ala Met Arg Lo.u Asp Ser His Ala Lys 
225 230 235 240 

Ala Leu Glu Arg AJ a Lys hyt^ Arg Val. Lys Phe Leu Va L Val Ala lie 

2SC 255 

Leu Ala Val Cys Leu Leu Cys Trp Thz Pro Tyr His Leu Ser Thr Val 
2S0 2Cb 270 

Val A.la Leu Thr Thr Asp Leu Pro C\u Thr Pro Leu Val He Ala He 
27B 280 285 

Ser Tyr Phe He Thr Ser Leu Thi ^yr Ala Asn Ser Cys Leu A^n Pro 
290 295 300 

Phe Leu Tyr Ala Phe Leu Asp Ala L^er Phe Arg Arg A^n Leu Arg Gin 
30S 310 iis 320 

Leu He Thr Cys Arg Ala Ala Ala 
325 

(172) INFORMATION FOR SEQ ID N0H71: 

(i) SEQUENCE CIIARACTERISTTCG : 

(A) LENGTH: 1002 bane pairs 

(B) TYPE: r.ueleic acid 

(C) STRA^JDEDNESS : single 

(D) TOPOLOGY: linear 

(li) KOLECtJLJ^ TYPE: DNA (genomic) 

ixi) SEQUENCE DESCRIPTION; SEQ ID NO: 171: 

ATGCAGGCCG CTGGGCACCC A;3AGCCCCTT GACAGCAGGG GCTCCTTCTC CCTCCCCACG 6 0 

ATGGGTGCCA ACGTCTCTCA GGACAATGGC ACTGGCCACA ATGCCACCTT CTCCGAGCCA 12 0 

CTGCCGTTCC TCTATGTGCT CCTGCCCGCC GTGTACTCCG GGATCTGTGC TGTGGGGCTG IRC 



wo 00/22129 PCT/US99/2393S 

121 

ACTGGCAAC7i CGGCCGICAT rCTTGTAATC CTT-^AGGGCGG CCiV/^GATGAA GAGGGTGACC 2 4 0 

AACGTGTTCA TCC-VGPJVCC'-: GGCCGTrGCC GACGGGCTCT TGACGCTGCT ACrTGCCTGI C 3 0 [; 

AACATCGCGG AGCACCTGCT GCAGTACTGG CCCT-TCGGGG AGCT^.CTGTG GAAGCTGGTG 30 0 

CTGGCCGTC^ ACCACTACAA CATGTTCTCC AGCATCTAG7 TCGTA3CCGT GATGAGCGT(J 42 0 

GACCGATAC: TGGTGGTGGT GGC<JACCGTG AGGTCCCGCL^ ACATGCCGTG GCGCAGCTAC 4BC 

CGGGGGGGGA AGGTGGCGAG CGT ^TGTGTC TGGGTGGGCG TCACGGT'TT GGTTCTGCCC B'lO 

TTGTTCTGrT TCGCTGGCGT GTA :AGCA^-.C GAGCTGCAGG TCCC.Vv^CTG TGGGGTGAG:: 600 

TTCCCGTGG':- CCGAGGAGGT CTGGTTCAJ\G i^GCAGCGGTG TCTAGAGGTT GGTGGTGGGC 6b0 

TTGGTGGTG-'"- CGGTGTGGAC GATGTGTGTG CTCTACACAG AGT-y^GrTGCG GAGG::TGGGG I'l') 

GCCGTGCGG : TGCGGTCTGG AGGGAAGGCT C^AGGCAAGG CC7iGGC:GGAA GGTi^AAAGTC 70- 

GTGGTGCTiJiJ TGGIGG'TGGC GGTGTGCCTC GTCTGCTGGA CGCGGTyCGA GCTGGGCTCT 84 0 

GTCGTGGCGC TGAG«:AGG,-A GGTGGCGCAG ACCGCACT(7CT "^GATCAGTAT GTGCTACGTC 90 0 

ATCAGGAGCC^ TGACGTACGG CAAGTCGTGC GTGAAGGCC: rc^GTGTAGGG CTTTCTAGAT 96 

GACAACTTGG GC:AA.;AAGTT GGGGAGGATA XGGGGGTGrT GA 1002 
(173) INFORM.ATI':.iIJ rOR SEQ ID NO:172: 

GEOUEr;c£ gh7lRAC1'er i stt cs : 

(A) I.KNGTH: 333 ammo acids 

(3) TYPF : a:r.ino acid 

(C) ST^.AtJTEDNESr; : 

;d) TrPOl.GGY: net relevant 

ill) HOLSCULH TYPE: protein 

(x:. ) SEQIISNCE DSSGRIPTION: SEQ ID MC:172: 

Met Gin Ala A] a Gly His Pio Glu Pro T.eu Ai;p S€ r Arg Gly Ser Phc 

1 .S 10 

Ser I.eu Pro Thr Met Gly Ala Asn Val Ser Gin Atf Ar.n Gly Thr Gly 
20 25 

:liG Azn Ala Thr Phe Ser Glu Pro Leu Pro Phe l.cu Tyr Val I.e-^; Leu 
3 5 4 0 4 5 

Pro Ala Val Tyr Ser Gly rie Cys Ala Val GXy Leu Thr Glv Acn Thr 
50 55 GO 

Ala Val Jlci Levi Val lie Leu Arg Ala Pro hyr. Met Ly:: Thr WaZ Thx 



6 7 0 



80 
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Asn Val Phe lie I.eii PiSn Leu A In Vai Ala Asp Gly Leu Phe Thr Leu 
8 5 90 9^ 

Val Leu Pro Val Asn L^- e Ala Glu Jlis Lcii Lr^u Gin 'I vr Trp Pro Phe 
100 105 iio 

Gly Glu Lei: Lou Cys Lys Leu Val Leu Ala Val Anp His Tyr Aizn Ho 
115 120 12^ 

Phe Ser Ser lie Tyr Phe Leu Ala Val Met Ser Val Asp Arg T^t Leu 
130 135 140 

Val Val Leu Ala Thr Val Arg Ser Arg His Met Pro Trp Arg Thr Tyr 
15C 15S xGO 

Arg Gly Ala Lys Val Ala r>er Leu Cys Val Trp Leu Gly Val Thr Val 
16 5 17 0 1 7 S 

Leu VaJ Leu Pro Phe Phe Ser Phe Ala Gly Val Tyr Ser Asn Glu Leu 
1 8 C 1 R 5 0 

Gin Va] Pro 5]er Cys Gly Leu Ser Phe Pro Trp Pro Glu Gin Var Trp 
195 200 205 

Phe Lvs Ala Ser Arg Val 7^/r Thr Leu Val Leu Gly Phe Val Leu Pre 
2 L 0 215 / 2 0 

Val Cys Thr T] e Cys Val Leu Tyr Thr Asp Leu Leu Arg Arg Leu Arg 

2 25 2 30 2:V_- 24 0 

Ala Val Arg Lou Arg r,er Gly Ala Lys Aia Leu Gly Ly^ Ala Arg Arg 
245 25C 255 

Ly5^ Val Lys Val Leu Val Leu Vai Val Leu Ala Val Cys Leu Leu Cys^ 
260 265 270 

Trp llir Pro Phe His Leu Ala Ser Val Val Ala Leu Tnr Thr Asp Leu 
275 280 285 

Pro G]n Thr Pro Leu Val He Scr Met Ser Tyr Val Tje Thr Ser Leu 
290 295 300 

Thr Tyr Ala Asn Ser Cys Leu Asn Pro Phe Leu Tyr Ala Phe Leu Asp 
305 310 3:5 320 

Asp Asn Phe Arg Lys Asn Phe Arg Ser He Leu Arg Cys 
325 330 

(174) INFOFKATICN FOR SEQ ID NO: 173: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1107 base pairs 
(B; TYPE: nucleic acid 

iC) STRANDEDNESS : Gingle 

(B) TOPOLOGY: linear 
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(i:) MOLECIJLF TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: GEQ ID NO : 1 7 3 : 

ATCGTCCTTG A3GTGAGT^A CCACOAAGX:^ CTAAATGACG CCGAGGTTGC CGCGCTCCTG f . G 

GAGAACTTCA GCTCTTCCTA TGACTATGGA GAAAACGAGA G'l GACTCGTG rTGTACCTGC 12 J 

CCGCCCTGCC CACAGGACTT CAGCCTGAA:' TTCGACCGGG CCTTCCTGCG AGCCCTCTA'-^ 18:) 

AGCCTCCTCT TTCTGCTGGG GCTGCTG-GG aj^CGGCC-CGG TGGGAGCCGT Gi:tGCTGAGG 24 0 

cggcggacag ccctgagcag caccgacacc ttcctgctcc acctagctgt agcagacac-::; :^oo 

CTGCTGGTGC T.IACACTGCC GCTCTGGGCA GTGGACGCTG CCGTCCAGT:: GGT:^TTTGGC 36 J 

TCTGGCCTCT GGAAAGTGGC AGGTGCGCTC TTGAACATGA ACTTCTACG-J AGGAGCCCT-:" ^2) 

CTGCTGGCCT GGATCAGCTT TGACCGriAC CTGA^^lCATAG TTCATGCCA'- CCA<?GTGTAG 4B') 

CGCCGGGGGC CCCCGGCCCG CGTGACGCTC ACCTGCCTGG CTGTCTGG JG GCTCTGCCT:- 5-1 U 

CTTTTCGCGC TCCCAGACTT CATGTTCCTG TCGGCCCA'::C ACGACGAG CG CGTCAA'^GC: 6 00 

ACCCACTGGC A.VrACAACT'T^ CCCACArGTG GGCCGCACGG CTCTCOGGGT GL^T'^CAGCTG 6 6 0 

GTGGCTGGGT TTCTGCTGCC CGTGCTG41T0 ATGGCCTACT GCTATJCCCA CAT :CT ^GC':,' 7 20 

GTGCTGCTGG T^^TCCAGGGG CGAGCG-^GG^: CT3CGGGCCA AGCGG'^TGGT GGT'^JGTGGTC 7 80 

GTGGTGGCCr TTGCCCTCTG CTGGACCCCC TATCACCTGG TGGTG'-TGG-T GGAOATGCTi;: 84 0 

ATGGACGT'4G GCGCrrTGGG C'.:GCAACTGT GGCCGAGAA/k GCAGGGTAGA CGTC-GCCA^AC- 9 GO 

TCGGTCACCT CAGGCCTGG^ CTACATGGAC TGCTGCCTCA ACCCGCTGCT CTATGCCTTT 96 0 

GTAGGGGTGA AGTTCCGGGA GGGGATOTGG ATGCTGCTCT TGCGCCTGGG CTGCCCCAAC 102 0 

CAGAGAGGGC TCCAGAGGCA GCCATCC;TCT TCCCGCCGGG ATTCATCCTG GTCTGAGACC 108 0 

TCAGAGGCGT CCTACTCGGG CTTGTGA 1X0 7 

(175) IHFORf-lATIOIJ FGP SEQ ID HO 174: 

(.l) sequence CKARACTERISTTCS : 

(A) LENGTH: :5 6 3 am: no acids 

;3) TYPE: aTiinc acid 

vC) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MC>LECULE TYPE: proLe-n 

(XX) SEOUENCK DESCRIPTION SEQ ID MO: 174: 
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Met. Val L.eu Glu Vdl Ser Asp His G]n val Leu Ann App Ala Glu Val 

1 S 10 

Ala Ala Lfju Leu Cl]u Asn Phe Ser Ser Ser Tyr Asp Tyr t]] y u As" 

CI -J Se-^T- Asp ;?ei Cys Cys Thr Scr ?ro Fro Cys Pro Gin Asp Phe Ser 
3 S 4 C 4 5 

l.eu A^ri Pl-ie Asp Arg A] a Phe Leu Pro Ala Lou Tyr Scr ^cu Leu Phe 
::>0 55 SC. 

Leu Leu Gly Leu Leu Gly Acn Gly Ala Vs. Ala Ala Val Leu Leu Scr 

65 70 7^ so 

Arg Arg Thr Ala Leu Ser Ser Thr Asp Tnr Phe Leu Leu Kis Leu Ala 
8 S 9 0 9S 

Val A.la Al;p Thr Le\i Leu Vat Leu Thr Leu Pro Leu I'rp Ala Val Asp 
100 lO'j 11 u 

Ala AJa Val Gin Trp Val l^hc Gly Se-r Gly Leu Cyr, :,ys Val Ala Gly 
lis 12 0 12 5 

Ala Leu Phe Asr. I]e Asn Phe Tyr 7ila C^ly Ala Lev; Lea Leu Ala Cys 
130 135 140 

lie Ser Phe Asp Arg Tyr Leu Asn TJe Val His Ala Tlir Gin Leu Tyr 
145 15C l^>L 160 

Arg Arg CJy Pro Pro Ala Arg Val Thr Leu TJir Cyij Leu Ala Val Trp 
16 5 170 175 

Gly Leu Cys Leu Leu Fhe Ala Lf^u Pro J\sp Plic lie Pne Leu Ser Ala 
laO 18"^ 19 0 

Hxs His A5p Glu Arg Leu P^sn Ala Thr His Cys Glii Tyr Asn Phe Pro 
195 200 205 

Gin Val Gly Arg Thr Ala Leu Arg Val Lou Gin Leu Val 7ila G]y Phe 

210 215 220 

Leu Leu Pr-o Leu Leu Val Met: Ala Tyr Cys Tyr Ala His TJe Leu Ala 
22S 230 235 240 

Val Leu Leu Val Ser Arg G]y Gin iurg Arg Leu Arg xMa Lyn Arg Leu 
245 2SC 255 

Val Val Val Val Val Val Ala Phe Ala Leu Cys Trp Thr Pro Tvr His 
260 265 270 

Leu Val Va] Lou Val Asp He Leu Met. Asp Leu Gly Ala Leu Ala Arg 
275 280 28 5 

Asn Cys Gly Arg Glu Scr Arg Val Asp Val A.i a I,ys Ser Val Thr Ser 
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S-'^O 29-^ 300 

Gly L.c:\i Gly Tyr Met Hin Cy^ Cys Leu Aimi Pro T.eu Leu Tyr Ala Piie 
3 0b 310 31[> 32 0 

Val Gly Val Lys Phc Arq Glu j\rg Met Trp Met Leu J.eu Leu Arc: Leu 
325 3:^0 33!=, 

Gly Cys Pro Asn Gin Arg Gly Leu Gin Arc Gin Pro Ser Ser Ser Arq 
340 345 350 

Arq Acp .Ser Ser Trp Ser Glu Thr Sor Glu Ala Scr Tyr Ser Gly Leu 
3 5 5 3 6 {} 3 6 

(lV6i INFORMATION FOR SKQ TD NO: .175: 

( i 1 SKOUKNCE CrlAJ^ACTERISTI CS : 

(A) LENGTH: 10 74 h-ase pair.q 
;b) TYPE: nucleic acid 
;C) STRAITOEDMESS : sxnqlc 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xl) SEOTJENCE DESCKIPllON: SF.Q in Nr .iVlj. 

ATCGCTGATG ACTATGGCTC TGAATCCACA TCTTCCATGC^ AAGACTACGT LAACTTCAAC 6 0 

TTCACTGACT TCTACTGTGA GAAAAAGAAT GTCAGGGAG . TTCX^GAGCCA TTTCCTCCCA 120 

CGCTTGTACT GGCTCGTGTT CATCGTGGGT GCCTTGGG.'^:. ACAGTCTTCT TA'T'CCTTGTC 180 

TACTGGTACT GC7vCAAGAGT GAAGACC.ATG ACCGACATGT TCCTTTTG7J\ TTTGGCAATT 2 4;; 
GCTGACCTCC TCTTTCTTGT CACTCTTCCC TTCTGGi„:CC\-. TTGCTGCTGC TGACCAGTG 3 00 

AAGTTCCAGA CCTTCATGTG CAAGGTGGTC AACAGCATGT ACAA.GATGAA CTTCTACAGC 360 

TGTGTGTTGC TCATCATGTG CATCAGCGTG GACAGGTACA TTGCCATTGC CCAGGCCATG 42 0 

AGAGCACATA CTTGGAGGGA GAA-AAGGCTT TTGTACACCA AAATGGTTTG CTTTACCATC 480 

TGGGTATTGG CAGCTGCTCT CTGCATGCCA GAAATCTTAT ACAGCCAA.\T :AAGGAGG?Ji 54 0 

TCGGGCATTG CTATCTGCAC CATGGTTTAC CCTAGCGATG AGAGCACCAA ACTGAAGTCA 6 00 

GCTGTCTTGA CCCTGAAGGT CATTCTGGGG TTCTTCCTTC CCTTCGTGGT CATGGCTTGG 66 0 

TGCTATACCA TGATCATTCA CACCCTGATA CAAGCCAAGA AGTCTTCCAA GCACAAAGCC 72 0 

AAGAAAGTGA CCATCACTGT CCTil^ACCGTC TTTGTCTTGT CTCAGTTTCC CTACAACTG:: 7y0 

ATTTTGTTGG TGCAGACCAT TGACGCCTAT GCCATGTT2A TCTCCAACTG TGGCGTTTCC 64 0 
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ACCAACATTG ACATCTGCTT CCAGGTCACC CAGACCATCG CCTTCTTCCA CAGTTGCCTG 90 0 

/i.^CCCTGTTC TCTATCTT-TT TC7GGGTGAG AGATTCCGCC GGGATCTCGT GA.AAACCCTG 96 0 

AAGAACTTGG GTTGCATCAG CCAGGCCCAG TGGGTTTCAT TTACAAGGAG AGAGGGAAGC 1020 

TTGAAGCTGT CGTCTATGTT GCTGGAGACA ACCTCAGGAG CACTCTCCCT CTGA 1074 

(177) INFOR^4ATrON FOR SEQ ID NO: 176: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 357 ammo acids 

IB) T^'PE : amino acid 

iC] S TRANDEDNESS : 

<Dl TOPOLOGY: not relevant 

(li) MOLEClJIiE TYPE: protein 

ixi) SEQUENCE DESCRIFTION: SEQ ID NO 175- 

Met Ala Asp Asp Tyr Gly Ser Glu Ser Thr Sor F.er Met Glu Asp Tyr 
1 5 10 15 

Val Asn Phe Asn Phe Thr Asp Phe Tyr Cys Glu Lyr Asn Asn Val Arg 
20 25 30 

Gin Phe Ala Ser His Phe Leu Pro Pro Leu T^/r Trp. Leu Va] Phe Tie 
3 5 4 0 4 5 

Val Gly Ala Leu Gly Asn Ser Leu Val lie Leu Val Tyr Trp Tyr Cys 
50 55 60 

Thr Arg Val Lys Thr Met Thr Asp Met Phe Leu Leu Asn Leu Ala He 
65 70 75 BO 

Ala Asp Leu Leu Phe Leu Val Thr Leu Pro Phe Trp Ala He Ala Ala 

85 9C 95 

Ala Asp Gin Trp Lys Phe Gin Thr Phe Met Cys Lys Val Val Asn Ser 
100 105 110 

Met Tyr Lys Met Asn Phe Tyr Ser Cys Val Leu Leu He Met Cys He 
115 120 125 

Ser Val Asp Arg Tyr He Ala He Ala Gin Ala Met Arg Ala Hxs Thr 
130 135 14C 

Trp Arg Glu Lys Arg Leu Leu Tyr Ser Lys Met Val Cys Pho Thr He 
145 150 155 .1.60 

Trp Val Leu Ala Ala Ala Leu Cys He Pro Glu He Leu 'Vyr Ser Gin 
165 170 175 

He Lys Glu Glu Ser Gly He Ala He C>'3 Thr Met Val Tyr Pro Ser 
180 185 190 
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Agp Glu Sk:!' Thr :,ys Leu Lys Sei Ala Va.! leu Thr Leu I.vr> Vai lie 
195 200 2Ub 

beu Gly Phe Phe Leu Pro Phe Va "1 Va i KeL Ala Cy^s Cys Tyr Thj. Ilc: 
210 215 22:) 

lie i:c Htg Thr Leu lie Gin liln Lyn Lys Ser Ser Lys His Lvs Ala 
225 230 235 " 240 

Lys Lys Val Thr lla Thr Vdl Leu Thr Val Phe Val Leu Ser Gin Phe 
24S 2bO 255 

Pro Tvr Asn Cys lie Leu Lr-u Val G]n Thr lie Asp Ala Tyr Ala Met 

260 26 5 270 

Phe lie Ser Asn Cys Ala Val Ser Thr Ann Tie Asp lie Cys Phe Gin 
275 280 28b 

Val Thr Gin Thr lie Ala Phe Phe His Ser Cy^3 Le\; Asn Pro Val Leu 
2 9 0 2 9 5 J 0 C 

Tyr V'r. 1 Phe Val Gly Glu Arg Phe Arg Arg Asp Leu Val Lys Thr Leu 
3C5 3J.0 3]5 320 

Lys Asn Leu Gly Cys Tie Ser Gin Ala Gin Trp Val Ser Fhe Thr Arg 
325 330 3 3b 

Arg Glu Gly Ser Leu Lys Leu Ser Ser Met Leu Leu Glu Thr Thr Ser 
34 0 345 3 50 

Gly Ala Leu St^r Le:u 
355 

{178) TNFORW-vTION FOR SEQ ID KO:177: 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 1110 base pairs 
(3) TYPE: nucleic acid 
;C) STRATJDEDNESS : Single 
(D) TOPOLOGY: linear 

ii-i) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCR nESCRIPTION: SEQ ID NO: 177: 

ATGGCCTCAT CGACCACTCG GGGCCCCAGG GTTTCTGACT TATTTTCTGG GCTGCCGCCG 0 

GCGGTC/ACA.A CTCCCGCCAA CCAGAGCGCA GAGGCCTCGG CGGCCAACGC GTC^GTGGCT 12 0 

GGCGCGGACG CTCCAGCCGT CACGCCCTTC CAGAGCCTGC AGCTGGTGCA TCAGCTGA7.<S 18 0 

GGGCTGATCG TGCTGCTCTA CAGCGTCGTG GTGGTCGTGG GGCTGGTGGG CAACTGCCTG 24 0 

CTGGTGCTGG TGATCGCGCG GGTGCCGCt^G CTGCACAACG TGACGAACTT CCTCATCGGC 300 
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AACCTGGCCT TGTCCGACGT GCTCATGTGC ACCGCCTGCG TGCCGCTGAC GCTGGCCTAT 36 0 

GCCTTCGAGC CACGCGGCTG GGTGTTCGGC GGCGGCCTGT GGCACCTGGT CTTCTTCCTG 42 0 

CAGCCGGTCA CCGTCTATGT GTCGGTGTTC ACGCTCACCA CCATCGCAGT GGACCGCTAC 48 0 

GTCGTGCTGG TGCACCCGCT GAGGCGCGCA TCTCGCTGCG CCTCAGCCTA CGCTGTGCTG 54 0 

5 GCCArCTGGG CGCTGTCCGC GGTGCTGGCG CTGCCGCCCG CCGTGCACAC CTATCACGTG 6 0 0 

GAGCTCAAGC CGCACGACGT GCGCCTCTGC GAGGAGTTCT GGGGCTCCCA GGAGCGCCAG 6GCi 

CGCCAGCTCT ACGCCTGGGG GCTGCTGCTG GTCATCTACC TGCTCCCTCT GCTGGTCATC 720 

CTCCTGTCTT ACGTCCGGGT GTCAGTGAAG CTCCGCl^iCC GOGTGGTGCC GGGCTGCGTG 7 80 

ACCCAGAGCC AGGCCGACTG GGACCGCGCT CGGCGCCGGC GCACCAAATG CTTGCTGGTG 84 0 

10 GTGGTCGTGG TGGTGTTCGC CGTCTGCTGG CTGCCGCTGC ACGTCTTCAA CCTGCTGCGG 900 

GACCTCGACC CCCACGCCAT CGACCCTTAC GCCTTTGGGC TGGTGCAGCT GCTCTGCCAC 96 0 

TGGCTCGCCA TGAGTTCGGC CTGCTACAAC CCCTTCAT'rT ACGCCTGGCT GCACGACAGC 102 0 

TTCCGCGAGG AGCTGCGCAA ACTGTTGGTC GCTTGGrCCC GTA^^GATAGC CCCCCATGGC 108 0 

CAGAATATGA CCGTCAGCGT GGTCATCTGA 1110 

15 (179) II^FORMATION FOR SEQ ID NO: 178: 

(:;) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 9 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : 

20 (D) TOPOLOGY: not relevant 

(li) MOLECULE TYPE: protein 

(>:i) SEQUENCE DESCRIPTION: SEQ ID MO: 178; 

Met Ala Ser Ser Thr Thr Arg Gly Pro Arg Val Ser Asp Leu Phe Ser 
15 10 15 

25 Gly Leu Pro Pro Ala Val Thr Thr Pro Ala Asn Gin Ser Ala Glu Ala 

2 0 15 3 0 

Ser Ala Gly Asn Gly Ser Val Ala Gly Ala Asp Ala Pro Ala Val Thr 
35 40 45 

Pro Phe Gin Ser Leu Gin Leu Val His Gin Leu Lys Gly Leu lie Val 
30 50 55 60 

Leu Leu Tyr Ser Val Val Val Val Val Gly Leu Val Gly Asn Cys Leu 
65 70 75 80 
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Leu Val Leu Val Ilo Ala Arg Val Pro Arg Leu His Asn Val Thr Asn 
8b 90 95 

Phe Lou lie Gly Asn Leu Ala Leu Ser Asp Val Leu .Xet Cys Thr Ala 
100 105 110 

Cys Val Pro Leu Thr Leu Ala Tyr Ala Phe Glu Pro Arg Gly Trp Val 
lib 120 125 

PhG Gly Gly Gly Leu Cys His Leu Val Phe Phe i^eu Gin Pro Val Thr 
130 135 140 

Val Tyr Val Ser Val Phe Thr Leu Thr Thr Tie Ala Val Asp Ara Tyr 
ISO 155 ^ 160 

Val Val Leu Val His Pro Leu Arg Arg Ala Ser Arq Cys Ala Ser Ala 
1S5 ] 70 

Tyr Ala Val Leu Ala lie Trp Ala Leu Sc^.r Ala Val Lou Ala Leu Pro 
19 C 185 190 

Pro Ala Val His Thr Tyr Hig Val Glu ^eu LyG Pro H^s Asp Val Arg 
195 200 205 

LGU Cys Glu Glu Phe Trp Gly Ser Gin G j ;i A>g Gin Arg Gin Leu Tyr 

210 215 2 2 0 

Ala Trp Gly Leu Leu Leu Val Thr Tyr 1,-u Leu Pro Lea Leu Val Tie 
225 230 235 240 

Leu Leu Ser Tyr Val Arg Val Ser Val Lys Leu Arg Asn Arg Val Val 
245 2bG 255 

Pro Gly Cys Vai Thr Gin Ser Gin Ala Asp Trp Asp Arg Ala Arg Arg 

260 265 270 

Arg Arc Thr Lys Cys Leu Leu Val Val Val Val VaJ Val Phe Ala Val 
275 280 285 

Cy;5 Trp Leu Pro Leu His Val Phe Asn Leu Leu Arg Asp Leu Asp Pro 
290 295 300 

His Ala He Asp Pro Tyr Ala Phe Gly Leu Val Gin Leu Leu Cys Hie 
305 310 31S ^ 

Trp Leu Ala Met Ser Ser Ala Cys Tyr Asn Pro Phe lie Tyr Ala Trp 
325 330 3 35^ 

Leu His Asp Ser Phe Arg Glu Glu Leu Arg Lys Leu Lou Val Ala Trp 
340 345 350 

Pro Arg Lys lie Ala Pro His Gly Gin Asn Met Thr Val Ser Val Val 
355 360 365 



Tie 



wo 00/22129 rrTA)S99/13938 



(IBO) INFORMATION FOR SEQ ID KG: 179; 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10&3 bd^e pairs 
(3) TYPE: nucleic acxd 

( C ) S TRANDEDNE£ S : s i ng 1 e 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 179: 

ATGGACCCAG AAGAAACTTC AGTTTATTTG GATl^ATTACT ATGCTACGAG CCC/iAACTCT 6 0 

GACATCAGGG AGACCCACTC CCATGTTCCT TACACCTCTG TCTTCCTTCC AGTCTTTTAC 12C 

ACAGCTGTGT TCCTGACTGG AGTGCTGGGC AACCTTGTTC TCATGGGAGC GTTGCATTTC 18:; 

AAACCCGGCA GCCGAAGACT GATCGACATC TTTATCATCA ATCTGGCTGC CTCTGACTTC 2 10 

ATTTTTCTTG TCACATTGCC TCTCTGGGTG GATAAAGAAG CATCTCTAGG ACTGTGGAGG 3 30 

ACGGGCTCCT TCCTGTGCA/v AGGGAGCTCC TACATGATCT CCGTC/^^^TAT G-CACTGCAGT 3ol) 

GTCCTCCTGC TCACTTGCAT GAGTGTTGAC CGCTACCTGG CCATTGTGTG r.CCAGTCGTA 4 20 
TCCAGGAAAT TCAGAAGGAC AGACTGTGCA T/iTGTAGTCT GTGCCAGCAT 
TCCTGCCTGC TGGGGTTGCC TACTCTTCTG TCCAGGGAGC TCACGCTGAT 
CCATACTGTG CAGAGAAAAA GGCAACTCCA ATTAAACTCA 7ATGGTCCCT 
ATTTTCACCT TTTTTGTCCC TTTGTTGAGC ATTGTGACCT GCTACTGTTG 

AAGCTGTGTG CCCATTACCA GCAATCAGGA AAGCACAACA AAAAG :TG/iA GAA^^TCTAAG 7::() 

AAGATCATCT TTATTGTCGT GGCAGCCTTT CTTGTCTCCT GGCTGCCCTT CA.^TACTTTC 780 

AAGTTCCTGG CCATTGTCTC TGGGTTGCGG CAAGAACACT ATTTACCCTC AGCTATTCTT 8^0 

CAGCTTGGTA TGGAGGTGAG TGGACCCTTG GCATTTGCCA ACAGCTGTGT C^JiCCCTTTC 90C 

ATTTACTATA TCTTCGACAG CTACATCCGC CGGGCCATTG TCCACTGCTT GTGCCCTTGC 9€C 

CTGAAAAACT ATGACTTTGG GAGTAGCACT GAGACATCAG ATAGTCACCT CACTAAGGCT 10x0 

CTCTCCACCT TCATTCA"GC AGAAGATTTT GCCAGGAGGA GGAAGAGGTC TGTGTCACTC 100 0 
TAA 



CTGGTTTATC 4^0 

TG ATGATAAG b-rO 

SGTGGCCTTA 600 

rATTGC/iAGCJ 66 



1083 

(181) INFORMATION FOR SEQ ID NO: 180: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 360 amino acid.^ 
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(D) TYPE: amino dcid 

(C) STRANDEDNESS : 

(D! TOPOLOGY: not relevant 

(li) MOLECULE TYPE: protean 

(xi} SEQUENCE DESCRIPTION: SEQ ID NO; 180: 

Met Asp Pro Glu Glu Thr Ser Val T^/r Leu Asp Tyr Tyr Tyr Ala Thr 
-1 5 10 ' 15 

Sor Pro Asn Fjcr Asp lie Arg Glu Thr His Ser His Val Pro Tyr Thr 
20 25 30 

Ser Val Phe Leu Pro Val Phe Tyr Thr Ala Val Phe Leu Thr Gly Val 
3 5 4 0 4 E> 

Leu Gly Asn Leu Val Leu Met Gly Ala Leu His Phe Lys Pre Gly Ser 
5 0 S S 6 0 

Arg 7-irg Leu lie Asp lie Phe He He Asa Leu Ala Ala Ser Asp Phe 
6 5 70 7 5 SO 

He Phe Leu Val Thr Leu Pro Leu Trp Val Asp Lys Glu Ala Ser Leu 
8 5 90 9S 

Gly Leu Trp Arg Thr Gly Ser Phe Leu Cys Lys Gly Ser Ser Tyi Met 
100 105 110 

lie Ser Val Asn Met His Cys Ser Val Leu Leu Leu Thr Cys Met Scr 
115 120 125 

Val Asp Arg Tyr Leu Ala He Val Trp Pro Val Val Ser Arg Lys Phe 
130 135 140 

Arg Arg Thr Aup Cys Ala Tyr Val Val Cys Ala Ser Ho Trp Phe He 
1^5 150 155 160 

Scr Cys Leu Leu Gly Leu Pro Thr Leu Lou Ser Arg Glu Leu Thr Leu 
16 5 170 175 

He Asp Asp Lys Pro Tyr Cys Ala Glu Lys Lys Ala Thr Pro He Lys 
180 1B5 190 

Leu He Trp Ser Leu Val Ala Leu He Phe Thr Phe Phe Val Pre Leu 
195 200 205 

Leu Ser He Val Thr Cys Tyr Cys Cys He Ala Arg Lys Leu Cys Ala 
210 215 220 

His Tyr Gin GJ a Ser Gly Lys His Asn Lys Lys Leu Lys Lys Sor Lys 
225 230 235 240 



Lys He He Phe Ho Val Val Ala Ala Phe Leu Val Ser Trp Leu Pro 
245 250 255 
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Phe Asn Thr Php I.ys Phe Levi Ala IJe Val Ser G\y I.eu Arq Gin Glu 
260 265 270 

His Tyr Leu Pro Ser Ala lie Leu Gin Leu Gly Met Glu Val Ser Gly 
275 280 285 

Pro Leu Ala Phe Ala Asn Ser Cys Val Acn Pro Phe He Tyr Tyr He 
290 295 300 

Phe Asp Ser Tyr He Arg Arg Ala He Val His Cyc Leu Cys Pro Cys 
305 310 315 320 

Leu Lys Asn 'J yr Asp Phe Gly Ser Ser Thr Glu Thr Ser Asp Ser His 
325 330 335 

Leu Thr Lvfi Ala Leu Ser Thr Phe He His Ala Glu Asp Phe Ala Arg 
340 345 350 

Arq Arg Lys Arg Ser Val Ser Leu 

355 3G0 

(182) INFORMATION FOR SEQ ID HO: 181: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 102 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPK ; DNA (genomicj 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 8 1 : 

ATGAATGGCC TTGAAGTGGC TCCCCCAGGT CTGATCAC2A ACTTCTCCCT GGCCACGGCA 6 0 

GAGCAATGTG GCCAGGAGAC GCCACTG3AG AACATGCTGT TCGCCTCCTT CTACCTTCTG 12 0 

GATTTTATCC TGGCTTTAGT TGGCAATACC CTGGCTCTGT GGCTTTTCAT CCGAGACCAC 18 0 

AAGTCCGGGA CCCCGGCCAA CGTGTTCCTG ATGCATCTGG CCGTGGCCG/i CTTGTCGTGC 24 0 

GTGCTGGTCC TGCCCACCCG CCTGGTCTAC CACTTCTCTG GGAACCACTG GCCATTTGGG 3 0C 

GAAATCGCAT GCCGTCTCAC CGGCTTCCTC TTCTACCTCA ACATGTACGC CAGCATCTAC 36 0 

TTCCTCACCT GCATCAGCGC CGACCGTTTC CTGGCCATTG TGCACCCGGT CAAGTCCCTC 42 0 

AAGCTCCGCA GGCCCCTCTA CGCACACCTG GCCTGTGCCT TCCTGTGG'^^T GGTGGTGGCT 4 80 

GTGGCGATGG CCCCGCTGCT GGTGAGCCCA CACACCGTC-C AGACCAAC ^A CACGGTGGTC 540 

TGCCTGCAGC TGTACCGGGA GAAGGCCTCC CACCATGCCC TGGTGTCCCT GGCAGTGGCC 600 

TTCACCTTCC CGTTCATCAC CACGGTCACC TGCTACCTGC TGATCATC'^G CAGCCTGCGG 6 60 
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CAGGGCCTGC GTGTGGAGAA GCGCCTCAAG ACCA-^.GGCAA AACGCATGAT CGCCATAGTG 72 0 

CTGGCCATCT TCCTGGTCTG CTTCGTGCCC TACCACGTCA ACCGCTCCGT CTACGTGCTG 780 

CACTACCGCA GCCATGGGGC CTCCTGCGCC ACCCAGCGCA TCCTGGCCCT GGCAAACCGC 84 0 

ATCACCTCCT GCCTCACCAG CCTCAACGGG GCACTCGACC CCATCATGTA TTTCTTCGTG 900 

GCTGAGAAGT TCCGCCACGC CCTGTGCAAC TTGCTCTGTG GCAAAAGGCT CAAGGGGGCG 96 0 

CCCCCCAGCT TCGAAGGGAA AACCAACGAG AGCTCGCTGA GTGCCVJ\GTC AGAGCTGTGA 102 0 
(18 3) INFORMATION FOR SEQ ID NO: 182: 

(i) SEQUENCE CIIAJLACTERISTICS : 

(A) LENGTH: 3 3 9 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE; protein 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 182: 

Met Asn Gj y Leu Glu Val Ala Pro P^c G.y Leu lie Thr Asn Phe Ser 
1 S ]0 15 

Leu Aia Thr Ala Glu Gin Cys Gly Gin Glu Thr Pro Leu Glu Asn Net 
20 2S 30 

Leu Phe Ala Ser Phe Tyr Leu Leu Asp Phe lie Leu Ala Leu Val Gly 
35 40 45 

Asn Thr Leu Ala Leu Trp Leu Phe lie Arg Asp H:.s Lys Ser Gly Thr 
50 55 

Pro Ala Asn Val Phe Leu Met His Leu Ala Val Ala A.sp Leu Ser Cys 
70 75 80 

Val Leu Val Leu Pro Thr Arg Leu Val Tyr His Phe Ser G}y Asn His 
85 50 95 

Trp Pro Phe Gly Glu lie Ala Cys Arg Leu Thr Gly Phe Leu Phe Tyr 
100 105 110 

Leu Acn Met Tyr Ala Ser He Tyr Phe Lou Thr Cys lie Ser Ala Asp 
115 12C 125 

Arg Phe Leu Ala He Val His Pro Val Lys Ser Leu Lys Leu Arg Arg 
130 135 140 

Pro Leu Tyr Ala His Leu Ala Cys Ala Phe Leu Trp Val Val Val Ala 
145 150 155 160 

Val Ala Met Ala Pro Leu Leu Val Ser Pro Gin Thr Val Gin Thr Asn 
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16b 170 17S 

Hin Thr Vai Vdl Cys Leu GJn Leu Tyr Arg Glu Lye Ala Ser His His 
180 185 190 

Ala Leu Val Sor Leu Ala Val Ala Phe Thr Phe Pro Phe lie Thr Thr 
195 200 205 

Val Thr Cys Tyr Leu Leu lie He Arg Ser Leu Arg Gin Gly Leu Arg 
210 215 220 

Val Glu Lys Arg Leu Lys Thr Lys Ala Lys Arg Met He Ala He Va] 
225 230 235 240 

Leu Ala He Phe Leu Val Cys Phe Val Pro Tyr His Val Ar>n Arg Ser 
245 250 255 

Val Tyr Val Leu His Tyr Arg Ser His Gly Ala Ser Cys Ala Thr Gin 
260 265 270 

Arg He Leu Ala Leu Ala Asn Arg He Thr Ser Cys Leu Thr Ser Leu 
275 280 ' 285 

Acn Gly Ala Leu Asp Pro He Met Tyr Phe Phe Val Ala Glu Lys Phe 
290 295 300 

Arg His Ala Leu Cys Asn Leu Leu Cys Gly Lys Arg Leu Lys Gly Pro 
305 310 315 ' 320 

Pro Pro Ser Phe Glu Gly Lys Thr Asn Glu Ser Ser Leu Ser Ala Lys 
325 330 335 

Ser Glu Leu 



(183) INFORMATION FOR SEQ ID NO: 183: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 996 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183: 

ATGATCACCC TGAACAATCA AGATCAACCT GTCCCTTTTA ACAGCTCACA TCCAGATGAA 6 0 

TACAAAATTG CAGCCCTTGT CTTCTATAGC TGTATCTTCA TAATTGGATT ATTTGTTAAC 12 0 

ATCACTGCAT TATGGGTTTT CAGTTGTACC ACCT^GAAGA GAACCACGGT AACCATCTAT 180 

ATGATGAATG TGGCATTAGT GGACTTGATA TTTATAATGA CTTTACCCTT TCGAATGTTT 24 0 
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TATTATGCAA AAGATGAATG GCCATTT3GA GAGTACTTCT GCCAGATTCT TGGAGCTCTC .10 0 

ACAGTGTTTT ACCCAAGCAT TGCTTTATGG CTTCTTGCCT TTATTAGTGC TGACAGATAC 36 0 

ATGGCCATTG TACAGCCGAA GTACGCCAAA GAACTTA-AAA ACACGTGCAA AGCCGTGCTG 42 0 

GCGTGTGTGG GAGTCTGGAT AATGACCCTG ACCACGACCA CCCCTCTGCT ACTGCTCTAT 48 0 

AAAGACCCAG ATAAAGACTC CACTCCCGCC ACCTGCCTCA AGATTTCTGA CATCATCTAT 54 0 

CTAAAAGCTG TGAACGTGCT GAACCTCACT CGACTGACAT TTTTTTTCTT GATTCCTTTG 500 

TTCATCATGA TTGGGTGCTA CTTGGTCATT ATTfTATAATC TCCTTCACGG CAGGACGTCT 66 0 

AAGCTGAAAC CCAAAGTCAA GGAGAAGTCC AAAAGGATCA TCATCACGCT GCTGGTGCAG 72 0 

GTGCTCGTCT GCTTTATGCC CTTCCACATC TGTTTCGCTT TCCTGATGCT GGGAACGGGG 78 0 

GAGAATAGTT ACAATCCCTG GGGAGCCTTT ACCACCTTCC TCATGAACCT CAGCACGTGT 84 0 

CTGGATGTGA TTCTCTACTA CATCGTTTCA AAACAATTTC AGGCTCGAGT CATTAGTGTC 90 0 

ATGCTATACC GTTU^TTACCT TCGAAGGATG CGCAGAAAAA GTTTCCGATC TGGTAGTCTA 96 0 

AGGTCACTA^A GCAATATAAA CAGTGAAJiTG TTATGA 996 
(185) INFORMATION FOR SEQ ID NO:184: 

(i; SEQUENCE CHAJUVCTERI STICS : 

(A) LENGTH: 3 31 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

iD) TOPOLOGY: not relevant 

(iv) MOLECULE TYPE: protein 

(xi ) SEOHEKCE DESCRIPTION: SEQ ID NO: 184; 

Met He Thr Leu Asn Asn Gin Asp Gin Pro Val Pro Phe Asn Ser Ser 
15 10 15 

His Pro Asp Glu Tyr Lys He Ala Ala Leu Val Phe Tyr Ser Cys He 
20 25 30 

Phe He He Gly Leu Phe Val Asn He Thr Ala Leu Trp Val Phe Ser 
35 40 45 

Cys Thr Thr Lys Lys Arg Thr Thr Val Thr He Tyr Met Met Asn Val 
50 55 60 

Ala Leu Val Asp Leu He Phe He Met Thr Leu Pro Phe Arg Met Phe 
6^ 70 75 eo 

Tyr Tyr Ala Lys Asp Glu Trp Pro Phe Gly Glu Tyr Phe Cy^ Gin He 

8 5 y 0 9 s 
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Leu Gly Ala Leu Thr Val Phe Tyr Pro Scr lie Ala Leu Trp Leu Leu 
100 10b 110 

Ala Phe lie Sor Ala Asp Arg Tyr Met Ala lie Val Gin Pro Lys Tyr 
11^ 120 125 

Ala Lys Glu Leu Lys Asn Thr Cyt3 Lys Ala Val Leu Ala Cys Val Gly 
130 135 140 

Val Trp lie Met Thr Leu Thr Thr Thr Thr Pro Leu Leu Leu Leu Tyr 
145 IbO 155 160 

Lys Asp Pro Asp Lys Asp Ser Thr Pro Ala Thr Cys Leu Lys Tie Ser 
165 170 175 

Asp lie Tie Tyr Leu Lys Ala Val Asn Val Leu Asn Leu Thr Arg Leu 
180 185 190 

Thr Phe Phe Phe Leu lie Pro Leu Phe I]e Met lie Gly Cys Tyi Leu 
195 200 2C5 

Val lie He His Asn Leu Leu His Gly Arq Thr Ser Lyn Leu Lyn Pre 
210 215 220 

Lys Val Lys Glu Lys Ser Lyn Aj-g Tf.c lie He Thr Leu Leu Val Gin 
225 230 235 240 

Val Leu Val Cys Phe Met Pro Phe His lie Cys Phe Ala Phe Leu Met 
245 250 255 

Leu Gly Thr Gly Glu Asn Scr T^-r Asn Pro Trp Gly Ala Phe Thr Thr 
260 265 270 

Phe Leu Met Asn Leu Ser Thr Cys Leu Asp Val lie Leu Tyr Tyr He 
275 280 285 

Val Ser Lys Gin Phe Gin Ala Arg Val He Ser Val Met Leu Tyr Arg 
290 295 300 

Asn Tyr Leu Arg Ser Met Arq Arg Lys Scr Phe Arg Ser Giy Ser Leu 
305 310 315 320 

Arg Ser Leu Ser Asn He Asn Ser Glu Met Leu 
325 330 

(18 6) INFORMATION FOR SEQ ID NO: 185: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1077 base pairs 
(R) TYPE: nucleic acid 

(C) STRANDEDNESS : singje 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic 1 
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{XI ) SEQUENCE DESCRIPTION- SEQ ID NO: 185: 

ATGCCCTCTG TGTCTCCAGC C^GGCCCTCG GCCGGGGCAG TCCCCAATGC CACCGCAGTG 6 0 

ACAACAGTGC GG/^CCAATGC CAGCGGGCTG GAGGTGCCCC TGTTCCACCT GTTTGCCCGG 12 0 

CTGGACGAGG AGCTGCATGG CACCTTCCCA GGCCTGTGCG TGGCGCTGAT GGCGGTGCAC 180 

GGAGCCATCT TCCTGGCAGG GCTGGTGCTC AACGGGCTGG CGCTGTAC3T CTTCTGCTGC 24 0 

CGCACCCGGG CC7VAGACACC CTCAGTCATC TACACCATCA ACCTGGTGGT GACCGATCTA 300 

CTGGTAGGGC TGTCCCTGCC CACGCGCTTC GCTGTGTACT ACGGCGCCAG GGGCTGCCTG 36 0 

CGCTGTGCCT TCCCGCACGT CCTCGGTTAC TTCCTCAACA TGCACTGCTC CATCCTCTTC 420 

CTCACCTGCA TCTGCGTGGA CCGCTACCTG GCCATCGI'GC GGCr^CGAAi^ CTCCCGCCGC 48 0 

TGCCGCC7yGC CTGCCTGTGC CAGGGCCGTG TGCGCCTTCG TGTGGCTGGC C3CCGGTGCC 54 0 

GTCACCCTGT CGGTGCTGGG CGTGACAGGC AGCCGGCCCT G :TGCCGTGT CTTTGCGCTG 600 

ACTGTCCTGG AGTTCCTGCT GCCCCTGCTG GTCATCAGCO TGTTTACCGG CCGCATCATG 660 

TGTGCACTGT CGCGGCCGGG TCTGCTCCAC CAGGGTCGJC AGCG^CGCGT GCGGGCCAAG 72 0 

CAGCTCCTGC TCACGGTGCT CATCATCTTT CTCGTCTGC'; VCACGCCr/n CCACGCCCGC 78 c 

CAAGTGGCCG TGGCGCTGTG GCCCGAC/iTG CCACACCA*-A CGAG':':T;:GT GGTCTACCAC 64 0 

GTGGCCGTGA CCCTCAGCAG CCTCAACAGC TGCATGGAC CCATirGTOTA CTGCTTCGTC 9 00 

ACCAGTGGCT TCCAGGCCAC CGTCCGAGGC CTCTTCGGC'C AC^CACGL^AGA GCGTGAGCCC 96 0 

AGCAGCGGTG ACGTGGTCAG CATGCACAGG AGCTCCAAC^G GCTCAGGCCG TCATCACATC 102 0 

CTCAGTGCCG GCCCTCACGC CCTCACCCAG GCCCTGGCTA ATGGGCCCGA GGCTTAG 107 7 
(18 7) INFORMATION FOR SEQ ID NO: 186: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 58 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186: 

Met Pro Scr Val Ser Pro Ala Gly Pro Ser Ala Gly Ala Val Pro Asn 
15 in 15 

Ala Thr Ala Val Thr Thr Val Arg Thr Asn Ala Ser Gly Leu Glu Val 
20 25 ' 30 
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Pro Leu Phe His Leu Phe Ala Arg Leu Asp Glu Glu Leu His Gly Thr 
35 40 45 

Phe Pro Gly Leu Cys Val Ala Leu Met Ala Val His Gly Ala lie Phe 
50 :5 60 

Leu Ala Gly Leu Val Leu Asn Gly Leu Ala Leu Tyr Val Phe Cys Cys 
65 70 75 80 

Arg Thr Arg Ala Lys Thr Pro Ser Val lie Tyr Thr He Aan Leu Val 
85 90 95 

Val Thr Asp Leu Leu Val Gly Leu Ser Leu Pro Thr Arg Phe Ala Val 
100 105 110 

Tyr Tyr Gly Ala Arg Gly Cys Leu Arg Cys Ala Phe Pro His Val Leu 
115 120 125 

Gly Tyr Phe Leu Asn Met Kis Cys Ser Tie Leu Phe Leu Thr Cys He 
130 135 14 0 

Cys Val Asp Arg Tyr Leu Ala He Val Arg Pro Glu Gly Ser Arg Ala 
145 150 155 160 

Cys Arg Gin Pro Ala Cys Ala Arg Ala Val Cys Ala Phe Val Trp Leu 
165 170 175 

Ala Ala Gly Ala Val Thr Leu Ser Val Leu Gly Val Thr Gly Ser Arg 
180 185 190 

Pro Cys Cys Arg Val Phe Ala Leu Thr Val Leu Glu Phe Leu Leu Pro 
195 200 205 

Leu Leu Val lie Ser Val Phe I'hr Gly Arg He Met Cys Ala Leu Ser 
210 215 220 

Arg Pro Gly Leu Leu His Gin Gly Arg Gin Arg Arg Val Arg Ala Lys 
225 230 235 240 

Gin Leu Leu Leu Thr Val Leu He He Phe Leu Val Cys Phe Thr Pro 
245 250 255 

Phe His Ala Arg Gin Val Ala Val Ala Leu Trp Pro Asp Met Pro His 
260 265 270 

His Thr Ser Leu Val Val Tyr His Val Ala Val Thr Leu Ser Ser Leu 
275 290 285 

Asn Ser Cys Met Asp Pro He Val Tyr Cys Phe Val Thr Ser Gly Phe 
290 295 300 

Gin Ala Thr Val Arg Gly Leu Phe Gly Gin His Gly Glu Arg Glu Pro 
305 310 315 320 



Ser Ser Gly Asp Val Val Ser Met His Arg Ser Ser Lys Gly Ser Gly 
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32 5 3 30 33 5 

Arg His His Jle Leu Ser Ala Gly Pro His Ala Leu Thr Gin Ala Leu 
340 345 350 

Ala Asn Gly Pro Glu Ala 
355 

(188) INFORMATION FOR SEQ ID NO: 187: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1050 base pairs 

(B) TYPE: nucleic acid 

(C) STRAHDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEOnEN'OE DESCRIPTION: SEQ ID NO: 187; 

ATGAACTCCA CCTTGGATGG TAATCAGAGC AGCCACCCTT TTTGCCTCTT GGCATTTGGC 6 0 

TATTTGGAAA CTGTCAATTT TTGCCTTTTG GAAGTATTGA TTATTGTCTT TCTAACTGTA 12 0 

TTGATTATTT CTGGCAACAT CATTGTGATT TTTGTATTTC ACTGTGCACC TTTGTTGAAC 180 

CATCACACTA CAAGTTATTT TATCCAGACT ATGGCATATG CTGACCTTTT TGTTGGGGTG 2 40 

AGCTGCGTGG TCCCTTCTTT ATCACTCCTC CATCACCCCC TTCCAGTAGA GGAGTCCTTG 3 00 

ACTTGCCAGA TATTTGGTTT TGTAGTATCA GTTCTGAAGA GCGTCTCCAT GGCTTCTCTG 36 0 

GCCTGTATCA GCATTGATAG ATACATTGCC ATTACTAAAC CTTTAACCTA TAATACTCTG 42 0 

GTTACACCCT GGAGACTACG CCTGTGTATT TTCCTGATTT GGCTATACTC GACCCTGGTC 4 30 

TTCCTGCCTT CCTTTTTCCA CTGGGGCAAA CCTGGATATC ATGGAGATGT GTTTCAGTGG 54 0 

TGTGCGGAGT CCTGGCACAC CGACTCCTAC TTCACCCTGT TCATCGTGAT GATGTTATAT 6 00 

GCCCCAGCAG CCCTTATTGT CTGCTTCACC TATTTCAACA TCTTCCGCAT CTGCCAACAG 6 60 

CACACAAAGG ATATCAGCGA AAGGCAAGCC CGCTTCAGCA GCCAGAGTGG GGAGACTGGG 72 0 

GAAGTGCAGG CCTGTCCTGA TAAGCGCTAT AAAATGGTCC TGTTTCGAAT CACTAGTGTA 760 

TTTTACATCC 7CTGGTTGCC ATATATCATC TACTTCTTGT TGGAAAGCTC CACTGGCCAC 84 0 

AGCAACCGCT TCGCATCCTT CTTGACCACC TGGCTTGCTA TTAGTAACAG TTTCTGCAAC 900 

TGTGTAATTT ATAGTCTCTC CAACAGTGTA TTCCAAAGAG GACTAAAGCG CCTCTCAGGG 96 0 

GCTATGTGTA CTTCTTGTGC AAGTCAGACT ACAGCCAACG ACCCTTACAC AGTTAGAAGC 102 0 

AAAGGCCCTC TTAATGGATG TCATATCTGA 1050 
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(189) TNFORKATION FOR SEQ ID NO: 188: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 349 amino acids 

(B) TYPE: amine acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQITENCE DESCRIPTION: SEQ ID NOiieB: 

Met Asn Ser Thr hcu Asp Gly Asn Gin Ser Eer His Pro Phe Cys Leu 
1 5 10 15 

Leu Ala Phe Gly Tyr Leu Glu Thr Val Asn Phe Cys Leu Leu Glu Val 
20 25 30 

Leu lie ne Val Phe Leu Thr Val Leu lie He Ser Gly Ann lie He 
35 40 45 

Val lie Phe Val Phe His Cys Ala Pro Leu Leu Asn iris His Thr Thr 
50 55 60 

Ser Tyr Phe He Gin Thr Met Ala Tyr Ala Asp Leu Phe Val Gly Val 

65 70 75 80 

Ser Cys Val Val Pro Ser Leu Ser Leu Leu His His Pro Leu Pro Val 

85 90 95 

Glu Glu Ser Leu Thr Cys Gin He Phe Gly Phe Val Val Ser Val Leu 
100 105 110 

Lys Ser Val Ser Met Ala Ser Leu Ala Cys He Ser He Asp Arg Tyr 
115 120 125 

He Ala lie Thr Lys Pro Leu Thr Tyr Asn Thr Leu Val Thr Pro Trp 
130 135 140 

Arg Leu Arg Leu Cys lie Phe Leu He Trp Leu Tyr Ser Thr Leu Val 
145 150 155 160 

Phe Leu Pro Ser Phe Phe His Trp Gly Lys Pro Gly Tyr His Gly Asp 
165 170 175 

Val Phe Gin Trp Cys Ala Glu Ser Trp His Thr Asp Ser Tyr Phe Thr 
180 185 190 

Leu Phe He Val Met Met Leu Tyr Ala Pro Ala Ala Leu He Val Cys 
195 200 205 



Phe Thr Tyr Phe Asn He Phe Arg He Cys Gin Gin His Thr Lys Asp 
210 215 220 
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He Sor Glu Arg Gin Ala Arg Phe Ser Ser Gin Ser Gly Glu Thr Gly 
225 230 235 240 

Glu Val Gin Ala Cys Pro Asp Lys Arg Ty^^ Lvs Met Val Leu Phe Arg 
245 250 255 

He Thr Ser Val Phe Tyr Hg Leu Trp Leu Pro Tyr He He Tyr Phe 
260 265 270 

Leu Leu Glu Ser Ser Thr Gly His Ser Asn Arg Phe Ala Ser Phe Leu 
275 280 285 

Thr Thr Trp Leu Ala He Ser Asn Ser Phe Cys Asn Cys Val He Tyr 
290 295 300 

Ser Leu Ser Asn Ser Val Phe Gin Arg Gly Leu Lys Arg Leu Sor Gly 
305 310 315 220 

Ala Met Cys Thr Ser Cys Ala Ser Gin Thr Thr Ala Asn Asp Pro Tyr 
325 330 335 

Thr Val Arg Ser Lys Gly Pro Leu Asn Gly Cys His He 
340 345 

(190) INFORMATION FOR SEQ ID MO: 189: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1302 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii} MOLECULE TYPE: DNA (genomic} 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 189: 

ATGTGTTTTT CTCCCATTCT GGAAATCAAC ATGCAGTCTG AATCTAACAT TACAGTGCGA 6 0 

GATGACATTG ATGACATCAA CACCAATATG TACCAACCAC TATCATATCC GTTAAGCTTT 12 0 

CAAGTGTCTC TCACCGGATT TCTTATGTTA GAAATTGTGT TGGGACTTGG CAGCAACCTC 180 

ACTGTATTGG TACTTTACTG CATGAAATCC AACTTAATCA ACTCTGTCAG TAACATTATT 24 0 

ACAATGAATC TTCATGTACT TGATGTAATA ATTTGTGTGG GATGTATTCC TCTAACTATA 3 00 

GTTATCCTTC TGCTTTCACT GGAGAGTAAC ACTGCTCTCA TTTGCTGTTT CCATGAGGCT 360 

TGTGTATCTT TTGCAAGTGT CTCAACAGCA ATCAACGTTT TTGCTATCAC TTTGGACAGA 420 

TATGACATCT CTGTAAAACC TGCAAACCGA ATTCTGACAA TGGGCAGAGC TGTAATGTTA 4 80 

ATGATATCCA TTTGGATTTT TTCTTTTTTC TCTTTCCTGA TTCCTTTTAT TGAGGTAAAT 54 0 

TTTTTCAGTC TTCAAAGTGG AAATACCTGG GAAAACAAGA CACTTTTATG TGTCAGTACA 6 00 
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TIATGAATACT ACACTGAACT GGGAATGTAT TATCACCTGT TAGTACAGAT CCCAATATTC 660 

TTTTTCACTG TTGTAGTAAT GTTAATCACA TACACCAAAA TACTTCAGGC TCTTAATATT 720 

CGAATAGGCA CAAGATTl'TC AACAGGGCAG AAGAAGAAAG CAAGAAAGAA AAAGACAATT 780 

TCTCTAACCA CACAACATGA GGCTACAGAC ATGTCACAAA GCAGTGGTGG GAGAAATGTA 94 0 

GTCTTTGGTG TAAGAACTTC AGTTTCTGTA ATAATTGCCC TCCGGCGAGC TGTGAAACGA 900 

CACCGTGAAC GACGAGAAAG ACAAAAGAGA GTCAAGAGGA TGTCTTTATT GATTATTTCT S6 0 

ACATTTCTTC TCTGCTGGAC ACCAATTTCT GTTTTAAATA CCACCATTTT ATGTTTAGGC 102 0 

CCAAGTGACC TTTTAGTAAA ATTAAGATTG TGTTTTTTAG TCATGGCTTA TGGAACAACT 10 BO 

ATATTTCACC CTCTATTATA TGCATTCACT AGACAAAAAT TTCAAAAGGT CTTGAAAAGT 114 0 

AATU^TGAAAA AGCGAGTTGT TTCTATAGTA GAAGCTGATC CCCTGCCTAA TAATGCTGTA 1200 

ATACACAACT CTTGGATAGA TCCCAAAAGA AACAAAAAAA TTACCTTTC^A AGATAGTGAA 126 0 

ATAAGAGAAA AACGTTTAGT GCCTCAGGTT GTCACAGACT AG 13 02 
(191) INFORMATION FOR SEQ ID NO: 190: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 33 amino acids 

(B) TYPE: amino acid 

(C) STRTVNDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: ISO: 

Met Cys Phe Ser Pro lie Leu Gl\i lie Asn Met Gin Ser GLu Ser Asn 
15 10 15 

He Thr Val Arg Asp Asp lie Asp Asp He Asn Thr Asn Met Tyr Gin 
20 25 30 

Pro Leu Ser Tyr Pro Leu Ser Phe Gin Val Ser Leu Thr Gly Phe Le\i 
35 40 45 

Met Leu Glu He Val Leu Gly Leu Gly Ser Asn Leu Thr Val Leu Val 
50 55 60 

Leu Tyr Cys Met Lys Ser Asn Leu He Asn Ser Val Ser Asn He He 
65 70 75 80 

Thr Met Asn Leu His Val Leu Asp Val He He Cys Val Gly Cys He 
85 90 95 

Pro Leu Thr He Val He Leu Leu Leu Ser Leu Glu Ser Asn Thr Ala 



wo 00/22129 



PrT/lIS99/23938 



143 

100 105 110 

Leu Tic Cys Cys Phe His Glu Ala Cys Val Ser Phe Ala Ser Val Ser 
115 120 125 

Thr Ala lie Acn Val Phe Ala lie Thr Leu Asp Arg Tyr Asp lie Ser 
130 135 140 

Val Lys Pro Ala Asn Arg lie Leu Thr Met Gly Arg Ala Val Met Leu 
145 150 155 160 

Met He Ser He Trp He Phe Ser Phe Phe Ser Phe Leu lie Pro Phe 
165 170 175 

He Glu Val Asn Phe Phe Ser Leu Gin Ser Gly Asn Thr Trp Glu Ann 
180 185 190 

Lys Thr Leu Leu Cys Val Ser Thr Asn Glu Tyr Tyr Thr Gli: Leu Gly 
195 2O0 205 

Met Tyr Tyr His Leu Leu Val Gin He Pro He Phe Phe Phe Thr Val 
210 215 220 

Val Val Met Leu He Thr Tyr Thr Lys He Leu Gin Ala Leu Asn He 
225 230 235 240 

Arg He Gly Thr Arg Phe Ser Thr Gly Gin Lys Lys Lys Ala Arg Lys 

245 250 255 

Lys Lys Thr He Ser Leu Thr Thr Gin His Glu Ala Thr Asp MeL Ser 
26 0 26 5 2 70 

Gin Ser Ser Gly Gly Arg Asn Val Val Phe Gly Val Arg Thr Ser Val 
2 75 28 0 285 

Ser Val He He Ala Leu Arg Arg Ala Val Lys Arg His TVrg Glu Arg 
290 295 300 

Arg Glu Arg Gin Lys Arg Val Lys Arg Met Ser Leu Leu He He Ser 
305 310 315 320 

Thr Phe Leu Leu Cys Trp Thr Pro He Ser Val Leu Asn Thr Thr He 
325 330 335 

Leu Cys Leu Gly Pro Ser Asp Leu Leu Val Lys Leu Arg Leu Cys Phe 
340 345 350 

Leu Val Met Ala Tyr Gly Thr Thr He Phe His Pro Leu Leu Tyr Ala 
355 360 365 

Phe Thr Arg Gin Lys Phe Gin Lys Val Leu Lys Ser Lys Met Lys Lys 
370 375 380 

Arg Val Val Ser He Val Glu Ala Asp Pro Leu Pro Asn Asn Ala Val 
385 390 395 400 
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lie HiG Ann Ser Trp lie hsp Pro Lys Arg Asn Lys Lys iJe Thr Phe 
405 410 415 



Glu Asp Ser Glu Tie Arg Glu Lys Arg Leu Val Pro Gl: 
420 425 



Val Val Thr 



430 



Asp 



(192) INFORMATION FOR SEQ ID NO: 191: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1209 base pairs 

(B) TYPE: nucleic acid 

(C) GTRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:iyi: 
ATGTTGTGTC CTTCCAAGAC AGATGGCTCA GGGCACTCTG GTAGGATTCA CCAGGAAACT 6 0 

CATGGAGAAG GGAAAAGGGA CAAGATTAGC AACAGTGAAG GGAGGGAGAA TGGTGGGAGA 12 0 

GGATTCCAGA TGAACGGTGG GTCGCTGGAG GCTGAGCATG CCAGCAGGAT GTCAGTTCTC 18 0 

AGAGCAAAGC CCATGTCAAA CAGCCAACGC TTGCTCCTTC TGTCCCCAGG ATCACCTCCT 24 0 

CGCACGGGGA GCATCTCCTA CATCAACATC ATCATGCCTT CGGTGTTCGG CACCATCTGC 3 00 

CTCCTGGGCA TCATCGGGT^ CTCCACGGTC ATCTTCGCGG TCGTGAAGAA GTCCAAGCTG 360 

CACTGGTGCA ACAACGTCCC CGACATCTTC ATCATCAACC TCTCGGTAGT AGATCTCCTC 4 20 

TTTCTCCTGG GCATGCCCTT CATGATCCAC CAGCTCATGG GCAATGGGGT GTGGCACTTT 48 0 

GGGGAGACCA TGTGCACCCT CATCACGGCC ATGGATGCCA A7AGTCAGTT CACCAGCACC 54 0 

TACATCCTGA CCGCCATGGC CATTGACCGC TACCTGGCCA CTGTCCACCC CATCTCTTCC 6 00 

ACGAAGTTCC GGAAGCCCTC TGTGGCCACC CTGGTGATCT GCCTCCTGTG GGCCCTCTCC 66 0 

TTCATCAGCA TCACCCCTGT GTGGCTGTAT GCCAGACTCA TCCCCTTCCC AGGAGGTGCA 72 0 

GTGGGCTGCG GCATACGCCT GCCCAACCCA GACACTGACC TCTACTGGTT CACCCTGTAC 790 

CAGTTTTTCC TGGCCTTTGC CCTGCCTTTT GTGGTCATCA CAGCCGCATA CGTGAGGATC 840 

CTGCAGCGCA TGACGTCCTC AGTGGCCCCC GCCTCCCAGC GCAGCATCCG GCTGCGGACA 9 00 

AAGAGGGTGA AACGCACAGC CATCGCCATC TGTCTGGTCT TCTTTGTGTG CTGGGCACCC 960 

TACTATGTGC TACAGCTGAC CCAGTTGTCC ATCAGCCGCC CGACCCTCAC CTTTGTCTAC 1020 

TTATACAATG CGGCCATCAG CTTGGGCTAT GCCAACAGCT GCCTCAACCC CTTTGTGTAC 1080 
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ATCGTGCTCT GTGAGACGTT CCGCAAACGC TTGGTCCTGT CCGTGAAGCC TGCAGCCCAG 114 0 

GGGCAGCTTC GCGCTGTCAG CAACGCTCAG ACGGCTGACG AGGAGAGGAC AGAAAGCAAA 1200 
GGCACCTCA 1209 
(193) INFORJ>^TION FOR SEQ ID K0:192: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 402 amino acids 

(B) TYPE: amino acid 

(C) STR7VNDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRTPTIO>J : SEQ ID NO: 192: 

Met Leu Cys Pro Ser Lys Thr Asp Gly Ser Gly His Ser Gly Arg lie 

1 5 10 15 

His Gin Glu Thr His Gly Glu Gly Lyt; Arg A.^p Lys lie Ser Agh Ser 
20 25 30 

Gin Gly Arg Glu Asn Gly Gly Arg Gly Phe Glii Met Asn Gly Gly Ser 
35 40 45 

Leu Glu Ala Glu His Ala Ser Arg Met :^cr Val :,eu Arg Ala Lys Pro 
50 55 60 

Met Ser Asn Ser Gin Arg Leu Leu Leu Leu Ser Pro Gly Ser Pro Pro 
65 70 75 8C 

Arg Thr Gly Ser lie Ser Tyr lie Asn Tie T^e Met Pro Ser Val Phe 
8 5 ^0 95 

Gly Thr lie Cys Leu Leu Gly lie lie Gly Asn Ser Thr Val He Phe 
100 105 110 

Ala Val Val Lys Lys Ser Lys Leu His Trp Cys Asn Asn Val Pro Asp 
115 120 125 

He Phe He He Asn Leu Ser Val Val Asp Leu Leu Phe Leu Leu Gly 
130 135 140 

Met Pro Phe Met He His Gin Leu Met Gly Asn Gly Val Trp His Phe 
145 150 155 160 

Gly Glu Thr Met Cys Thr Leu He Thr Ala Met Asp Ala Asn Ser Gin 
165 170 175 

Phe Thr Ser Thr Tyr He Leu Thr Ala Met Ala He Asp Arg Tyr Leu 
180 185 190 

Ala Thr Val His Pro He Ser Ser Thr Lys Phe Arg Lys Pro Ser Val 
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195 200 205 

Ala Thr Leu Val lie Cys Leu Leu Trp Ala Leu Ser Phe He Sei He 
210 215 220 

Thr Pro Val Trp Leu Tyr Ala Arg Leu He Pro Phe Pro Gly Gly Ala 
225 230 235 240 

Val Gly cys Gly He Arg Leu Pro Asn Pro Asp Thr Asp Leu Tyr Trp 
245 250 255 

Phe Thr Leu Tyr Gin Phe Phe Leu Ala Phe Ala Leu Pro Phe Val Val 
260 265 270 

He Thr Ala Ala Tyr Val Arg He Leu Gin Arg Met Thr Ger Ser Val 
275 280 285 

Ala Pro Ala Ser Gin Arg Ser He Arg Leu Arg Thr Lyis Arg Val Lys 
290 295 300 

Arg Thr Ala He Ala He Cys Leu Val Phe Phe Val Cys Trp Ala Pro 
305 310 315 320 

Tyr Tyr Val Leu Gin Leu Thr Gin Leu Ser He Ser Arg Pro Thr Leu 

325 330 335 

Thr Phe Val Tyr Leu Tyr Asn Ala Ala He Ser Leu Gly Tyr Ala Asn 
340 345 350 

Ser CyB Leu Asn Pro Phe Val Tyr He Val Leu Cys Glu Thr Phe Arg 
355 360 365 

Lys Arg Leu Val Leu Ser Val Lys Pro Ala Ala Gin Gly Gin Leu Arg 
370 375 300 

Ala Val Ser Asn Ala Gin Thr Ala Asp Glu Glu Arg Thr Glu Ser Lys 
385 39 0 395 400 

Gly Thr 

(194) INFORMATION FOR SEQ ID NO: 193: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:193: 
ATGGATGTGA CTTCCCAAGC CCGGGGCGTG GGCCTGGAGA TGTACCCAGG CACCGCGCAC 6 0 

GCTGCGGCCC CCAACACCAC CTCCCCCGAG CTCAACCTGT CCCACCCGCT CCTGGGCACC 120 
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GCCCTGGCCr. 7VTGGGACAGG TGAGCTCTCG GAGGACCAoC AGTACGTCAT CGGCCTGTTC 180 

CTCTCGTGCC TCTACACCAT CTTCCTCTTC CCCA':'CGGZr TTGTGGGCAA CATCCTGATC 24 0 

CTGGTGGTGA ACATCAGCTT CCGCGAGAAG ATGACCATCC CCGAGCTGTA CTTCATCAAC 3 00 

CTGGCGGTGG CGGACCTC/iT CCTGGTGGCC GACTCCCTCA TTGAGGTGTT CAACCTGCAC 3^0 

GAGCGGTACT ACGACATCGC CGTCCTGTGC ACCTTCATGT CGCTCTTCCT GCAGGTCAAC 42 0 

/iTGTACAGCA GCGTCTTCTT CCTCACCTGG ATGAGCTTGG ACCGCTACAT CGCCCTGGCC 4 80 

AGGGCCATGC GCTGCAGCCT GTTCCGCACC AA(?CACCAGG (^CCGGGTGAG CTGTGGCCTC 5-;0 

ATCTGGATGG CATCCGTGTC AGCCACGCTG GTGCCCTTCA CCGCCGTGCA CCTGCAGCAC 600 

ACCGACGAGG CCTGGTTCTG TTTCGCGGAT GTCCGGGAGG TGCAGTGGCT CGAGGTCACG 6^0 

CTGGGCTTCA TCGTGCCCTT CGCCATCATC GGCCTGTGCT ACTGCCTCAT TGTCCGGGTG 7:^0 

CTGGTCAGGG CGCACCGGCA CCGTGG(^CTG CGGCCCCG(jC GGCAGAAGGC GAAACGCATG 78 0 

ATCCTCGCGG TGGTGCTGGT CTTCTTCGTC TGCTGGCTGC CGGAGAAGGT CTTCATCAGC 84 0 

GTGCACCTCG TGCAGCGGAC GCAGCCTGGG GCCGCTCGCT GCAAGCAGTC TTTCCGCCAT 900 

GCCCACCCCC TCACGGGCCA CATTGTCAAC CTCGCCGCCT TCTCCAACAG CTGCCTAAAC 9f0 

CCCCTCATCT ACAGCTTTCT CGGGGAGACC TTCAGGGACA AGCTGAGGGT GTACATTGAG 102 0 

cagaaaacaa atttgccggc cctgaaccgc ttctgtcacg ctgccctgaa ggccgtcatt loeo 

CCAGACAGCA CCGAGCAGTC GGATGTGAGG TTCAGCAGTG CCGTGTGA 111 8 

(195} INFORMATION FOR SEQ ID NO:194; 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 75 amino acids 
(R) TYPE: amino acid 

(C) STRANDEDNESS : 

{D} TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194: 

Met Acp Val Thr Ser Gin Ala Arg Gly Val Gly Le-u Glu Met Tyr Pre 
lb 10 15 

Gly Thr Ala His Ala Ala Ala Pro Asn Thr Thr Ser Pro Glu Leu Asn 

20 2S 30 

Leu Ser His Pro Leu Leu Gly Thr Ala Leu Ala Asn Gly Thr Gly Glu 
3 b 4 0 4 S 
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Leu Ser Glu Kis Gin Gin Tyr Val lie Gly Leu Phe Leu Ser Cys Leu 

50 55 60 

Tyr Thr He Phe Leu Phe Pro Ho Gly Phe Val Gly Asn He Leu lie 

65 70 75 80 

Leu Val Val A2n He Ser Phe Arg Glu Lys Met Thr He Pro Asp Leu 
65 90 95 



Tyr Phe He Asn Leu Ala Val Ala Asp Leu He Leu Val Ala Asp Ser 
100 105 110 

Leu He Glu Val Phe Asn Leu His Glu Arg Tyr Tyr Asp He Ala Val 
115 120 125 



Leu Cys Thr Phe Met Ser 
13C 

Val Phe Phe Leu Thr Trp 

145 ISO 

Arg Ala Met Arg Cys Ser 
165 

Ser Cys Gly Leu He Trp 
180 

Phe Thr Ala Val His Leu 
195 

Ala Asp Val Arg Glu Val 

210 

Val Pro Phe Ala He He 
225 230 

Leu Val Arg Ala His Arg 
245 



Leu Phe Leu Gin Val 
135 

Met Ser Phe Asp Arg 
155 

Leu Phe Arg Thr Lys 
170 

Met Ala Ser Val Ser 
18 ^ 

Gin His Thr Asp Glu 
200 

Gin Trp Lev Glu Val 
215 

Gly Leu Cys Tyr Ser 

235 

His Arg Gly Leu Arc 
250 



Asn Met Tyr Ser Ser 
140 

Tyr lie Ala Leu Ala 
160 

His His Ala Arg Leu 
1 75 

Ala Thr Leu Val Pro 
190 

Ala Cys Phe Cys Phe 
205 

Thr Leu Gly Phe He 
220 

Leu He Val Arg Val 
240 

Pro Arg Arg Gin Lys 

255 



Ala Lys Arg Met 

260 

Leu Pro Glu Asn 
275 

Pro Gly Ala Ala 

290 

Thr Gly His He 
305 

Pro Leu He Tyr 



Leu Tyr He Glu 



He Leu Ala Val 



Val Phe He Ser 
280 

Pro Cys Lys Gin 
295 

Val Asn Leu Ala 
310 

Ser Phe Leu Gly 
325 

Gin Lys Thr Asn 



Val Leu Val Phe 

265 

Val His Leu Leu 



Ser Phe Arg His 
300 

Ala Phe Ser Asn 
315 

Glu Thr Phe Arg 
330 

Leu Pro Ala Leu 



Phe Val Cys Trp 
270 

Gin Arg Thr Gin 
285 

Ala His Pro Leu 



Ser Cys Leu Asn 

320 

Asp Lys Lgu Arg 

335 

Asn Arg Phe Cys 
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340 345 350 

His Ala Ala Leu Lys Ala Vdl lie Pro Asp Ser Thr Glu Glri Ser Asp 
3 55 3 60 36 5 

Val Arg Phe Ser Ser Ala Val 
370 375 

(196) INFORMATION FOR SEQ ID NO: 195: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 960 ba^ie pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195: 



i^ggtgtcttg 6 0 

:;a.ccttcctg 120 

':ctggctgac 18 0 

zcaggcttgg 24 0 

:CG CAGCGTG 3 00 



ATGCCATTCC CAAACTGCTC AGCCCCCAGC ACTGTGGT3G CCACAjCTGT 
CTGGGGCTGG AGTGTGGGCT GGGTCTGCTG GGCAACGCGG TGGCG':'TGTG 
TTCCGGGTCA GGGTGTGGAA GCCGTACGCT GTCTACCTGC TCAJ\CCTGGC 
CTGCTGTTGG CTGCGTGCCT gcctttcctg gccgcctt-t ACCTGAGCCT 
CATCTGGGCC GTGTGGGCTG CTGGGCCCTG CGCTTCCT ^i:^ TGGAr-CTCAG 

GGGATGGCCT TCCTGGCCGC CGTGGCTTTG GACCGGTAl'C TCCGTGTGGT CCACCCTCGG 36 0 

CTTAAGGTCA ACCTGCTGTC TCCTCAGGCG GCCCTGGGGG TCrCGGGCCT CGTCTGGCTC 42 0 

CTGATGGTCG CCCTCACCTG CCCGGGCTTG CTCATCTCTG AGGCCGCCCA GAACTCCACC 4 80 

AGGTGCCACA GTTTCTACTC CAGGGCAGAC GGCTCCTTCA GCATCATCTG GCAGGAAGCA 54 0 

cTCTCCTGcc ttcagtttgt cctccccttt ggcctcatcg tgttctgcaa tgcaggcatc goo 

ATCAGGGCTC TCCAGAAAAG ACTCCGGGAG CCTGAGAAAC AGCCCAAGCT TCAGCGGGCC 6 60 

AAGGCACTGG TCACCTTGGT GGTGGTGCTG TTTGCTCTGT GCTTTCTGCC CTGCTTCCTG 72 0 

GCCAGAGTCC TGATGCACAT CTTCCAGAAT CTGGGGAGCT GCAGGGCCCT TTGTGCAGTG 7 80 

GCTCATACCT CGGATGTCAC GGGCAGCCTC ACCTACCTGC ACAGTGTCGT CAACCCCGTG 84 0 

GTATACTGCT TCTCCAGCCC CACCTTCAGG AGCTCCTATC GGAGGGTCTT CCACACCCTC 9 0C 

CGAGGCAAAG GGCAGGCAGC AGAGCCCCCA GATTTCAACC CCAGAGACTC CTATTCCTGA 96 0 
(197) INFORMATION FOR SEQ ID NO: 196: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 319 amino acids 

(B) TYPE; amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 



(ii) MOLEaJLE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196: 



Met Pro Phe Pro 
1 

Val G^y Val Leu 
20 

Ala Val Ala Leu 

35 

Tyr Ala Val Tyr 
50 

Ala Cys Leu Pro 
65 

His Leu Gly Arg 



Ser Arg Ser Val 
100 

Tyr Leu Arg Val 
115 

Gin Ala Ala Leu 

130 

Leu Thr Cys Pro 
145 

Arg Cys His Ser 



Trp Gin Glu Ala 
180 

lie Val Phe Cys 
195 

Arg Glu Pro Glu 
210 

Thr Leu Val Val 

225 

Ala Arg Val Leu 



Acn Cys Ser Aia 
Leu Gly Leu Glu 



Trp Thr Phe Leu 
40 

Leu Leu Agh Leu 
55 

Phe Leu Ala Ala 
70 

Val Gly Cys Trp 
85 

Gly Met Ala Phe 



Val His Pro Arg 
120 

Gly Val Ser Gly 

135 

Gly Leu Leu lie 
ISO 

Phe Tyr Ser Arg 
165 

Leu Ser Cys Leu 



Asn Ala Gly lie 

200 

Lys Gin Pro Lys 
215 

Val Leu Phe Ala 
230 

Met His lie Phe 



Pro Ser Thr Val 
10 

Cys Gly Leu Gly 
25 

Phe Arg Val Arg 



Ala Leu Ala Asp 

60 

Phe Tyr Leu Ser 
75 

Ala Leu Arg Phe 

90 

Leu Ala Ala Val 
105 

Leu Lys Val Asn 



Leu Val Trp Leu 
140 

Ser Glu Ala Ala 
155 

Ala Asp Gly Ser 
17C 

Gin Phe Val Leu 
185 

lie Arg Ala Leu 



Leu Gin Arg Ala 
22C 

Leu Cys Phe Leu 

235 

Gin Asn Leu Gly 



Val Ala Thr Ala 
15 

Leu Leu Gly Asn 
30 

Val Trp Lys Pro 
45 

Leu Leu Leu Ala 



Leu Gin Ala Trp 
80 

Leu Leu Asp Leu 

95 

Ala Leu Asp Arg 
110 

Leu Leu Ser Pro 

125 

Leu Met Val Ala 



Gin Asn Ser Thr 
160 

Phe Ser lie He 
175 

Pro Phe Gly Leu 
190 

Gin Lys Arg Leu 

205 

Lys Ala Leu Val 



Pro Cys Phe Leu 
240 

Ser Cys Arg Ala 
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24b 250 2hb 

Leu CVG Ala V^l Ala His Thr Ser Anp Val Thr Gly Ser Leu Thr Tyr 
260 265 270 

Leu His Ser Val Val Ann Pro Val Val Tyr Cys Phe Sor Ser Pro Thr 
275 280 285 

Phe Arg Ser Ser Tyr Arg Arg Val Phe His Thr Leu Arg Gly Lys Gly 
290 295 300 

Gin Ala Ala Glu Pro Pro Asp Phe Asn Pro Arg Asp Ser Tyr Ser 
305 310 315 

(198) INFCRTvlATION FOR SEQ ID NO: 197: 

(i) SEQUENCK CHARACTERISTICS: 
(A) LENGTH: 114 3 base pairs 
;b) TYPE: nucleic acid 

(C) STRANt'EDNES£ : r^ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 197: 

ATGGAGGAAG GTGGTGATIT TGACAACTAC TATGGGGCAG ACAACCAGTC TGAGTGTGAG 6 0 

TACACAGACT GGAAATCCTC GGGGGCCCTC ATCCCTGCCA TCTACATGTT GGTCTTCCTC 12 0 

CTGGGCACCA CGGGAAACGG TCTGGTGCTC TGGACCGTGT TTCGGAGCAG CCGGGAGAAG 180 

AGGCGCTCAG CTGATATCTT CATTGCTAGC CTGGCGGTGG CTGACCTGAC CTTCGTGGTG 24 0 

ACGCTGCCCC TGTGGGCTAC CTACACGTAC CGGGACTATG ACTGGCCCTT TGGGACCTTC 3 00 

TTCTGCAAGC TCAGCAGCTA CCTCATCTTC GTCAACATGT ACGCCAGCGT CTTCTGCCTC 360 

ACCGGCCTCA GCTTCGACCG CTACCTGGCC ATCGTGAGGC CAGTGGCCAA TGCTCGGCTG 42 0 

AGGCTGCGGG TCAGCGGGGC CGTGGCCACG GCAGTTCTTT GGGTGC TGGC CGCCCTCCTG 480 

GCCATGCCTG TCATGGTGTT ACGCACCACC GGGGACTTGG AGAACACCAC TAAGGTGCAG 54 0 

TGCTACATGG ACTACTCCAT GGTGGCCACT GTGAGCTCAG AGTGGGCCTG GGAGGTGGGC 6 GO 

CTTGGGGTCT CGTCCACCAC CGTGGGCTTT GTGGTGCCCT TCACCATCAT GCTGACCTGT 66 0 

TACTTCTTCA TCGCCCAAAC CATCGCTGGC CACTTCCGCA AGGAACGCAT CGAGGGCCTG 720 

CGGAAGCGGC GCCGGCTTAA GAGCATCATC GTGGTGCTGG TGGTGACCTT TGCCCTGTGC 780 

TGGATGCCCT ACCACCTGGT GAAGACGCTG TACATGCTGG GCAGCCTGCT GCACTGGCCC 84 0 

TGTGACTTTG ACCTCTTCCT CATGAACATC TTCCCCTACT GCACCTGCAT CAGCTACGTC 900 
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AACAGCTGCC TCAACCCCTT CCTCTATGCC TTTTTCGACC CCCGCTTCCG CCAGGCCTGC 96 0 

ACCTCCATGC TCTGCTGTGG CCAGAGCAGG TGCGCAGGCA CCTCCCACAG CAGCAGTGGG 102 0 

GAGAAGTCAG CCAGCTACTC TTCGGGGCAC AGCCAGGGGC CCGGCCCCAA CATCGGCAAG 108 0 

GGTGGAGAAC AGATGCACGA GAAATCCATC CCCTACAGCC AGGAGACCCT TGTGGTTGAC 114 0 

'^AG 1^43 
(199) INFORr^ATION FOR SEQ ID NO: 198: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 380 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTIOK: SEQ ID NO: 198: 

Met Glu Glu Gly Gly Asp Phe Asp Asn Tyr Tyr Gly Aia Asp Asn Gin 
15 10 15 

Ser Glu Cys Glu Tyr Thr Asp Trp hvF. Ser Ger Gly Ala Leu lie Pro 
20 25 30 

Ala He Tyr Met Leu Val Phe Leu Leu Gly Thr Thr Gly Asn Gly Leu 
35 40 .15 

Val Leu Trp Thr Val Phe Arg Ser Ser Arg Glu Lys Arg Arg Ser Ala 
50 55 60 

Asp He Phe He Ala Ser Leu Ala Val Ala Asp l,eu Thr Phe Val Val 
65 70 75 80 

Thr Leu Pro Leu Trp Ala Thr Tyr Thr Tyr Arg Asp Tyr Asp Trp Pro 
85 90 ' 95 

Phe Gly Thr Phe Phe Cys Lys Leu Ser Ser Tyr Leu lie Phe Val Asn 
100 105 110 

Met Tyr Ala Ser Val Phe Cys Leu Thr Gly Leu Ser Phe Asp Arg Tyr 
115 120 125 

Leu Ala He Val Arg Pro Val Ala Asn Ala Arg Leu Arg Leu Arg Val 
130 135 " 140 

Ser Gly Ala Val Ala Thr Ala Val Leu Trp Val Leu Ala Ala Leu Leu 
145 150 155 160 

Ala Met Pro Val Met Val Leu Arg Thr Thr Gly Asp Leu Glu Asn Thr 

165 170 175 
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Thr Lys Val Gin Cys Tyr Met Asp Tyr Ser Met Val Ala Thr Val Ser 
ISO 185 190 

Ser Glu Trp Ala Trp Glu Val Gly Leu Gly Val Ser Ser Thr Thr Val 
195 20C 205 

Gly Phe Val Val Pro Phe Thr lie Met Leu Thr Cys Tyr Phe Phe He 
210 215 220 

Ala Gin Thr He Ala Gly His Phe Arg Lys Glu Arg Tie Glu Gly Leu 
225 230 235 240 

Arg Lys Arg Arg Arg Leu Lys Ser He He Val Val Leu Val Val Thr 
245 250 255 

Phe Ala Leu Cys Trp Met Pro Tyr Hi £5 Leu Val Lys Thr Leu Tyr Met 
260 265 270 

Leu Gly Ser Leu Leu His Trp Pro Cys Acsp Phe Asp Leu Phe Leu Met 
275 280 285 

Asn He Phe Pro Tyr Cys Thr Cys He Ser Tyr Val Asn Ser Cys Leu 
290 295 300 

Asn Pro Phe Leu Tyr Ala Phe Phe Asp Pro Arg Phe Arg Gin Ala Cys 

305 310 315 320 

Thr Ser Met Leu Cys Cys Gly Gin Ser Arg Cys Ala Gly Thr Ser His 
325 330 335 

Sor Ser Ser Gly Glu Lys Ser Ala Ser Tyr Ser Ser Gly His Ser Gin 
340 345 350 

Gly Pro Gly Pro Asn Met Gly Lys Gly Gly Glu Gin Met His Glu Lys 
355 360 365 

Ser He Pro lyr Ser Gin Glu Thr Leu Val Val Asp 
370 375 380 

[200) INFORMATION FOR SEQ ID NOH99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1119 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEOUENCE DESCRIPTION: SEQ ID KO:199: 
ATGAACTACC CGCTTU^CGCT GGAAATGGAC CTCGAGAACC TGGAGGACCT GTTCTGGGAA 6 0 

CTGGACAGAT TGGACAACTA TAACGACACC TCCCTGGTGG AAAATCATCT CTGCCCTGCC 12 0 
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ACAGAGGGTC CCCTCATGGC CTCCTTCAAG GCCGTGTTCG TGCCCGTGGC CTACAGCCTC 180 

ATCTTCCTCC TGGGCGTGAT CGGCAACGTC CTGGTGCTGG TGATCCTGGA GCGGCACCGG 24 0 

CAGACACGCA GTTCCACGGA GACCTTCCTG TTCCACCTGG CCGTGGCCGA CCTCCTGCTG 3 00 

GTCTTCATCT TGCCCTTTGC CGTGGCCGAG GGCTCTGTGG GCTGGGTCCT GGGGACCTTC 360 

CTCTGCAAAA CTGTGATTGC CCTGCACAAA GTCAACTTCT ACTGCAGCAG CCTGCTCCTG 42 0 

GCCTGCATCG CCGTGGACCG CTACCTGGCC ATTGTCCACG CCGTCCATGC CTACCGCCAC 480 

CGCCGCCTCC TCTCCATCCA CATCACCTGT GGGACCATCT GGCTGGTGGG CTTCCTCCTT 54 0 

GCCTTGCCAG AGATTCTCTT CGCCAAAGTC AGCCAAGGCC ATCACAACAA CTCCCTGCCA 600 

CGTTGCACCT TCTCCCAAGA GAACCAAGCA GAAACGCATG CCTGGTTCAC CTCCCGATTC 66 0 

CTCTACCATG TGGCGGGATT CCTGCTGCCC ATGCTGGTGA TGGGCTGGTG CTACGTGGGG 72 0 

GTAGTGCACA GGTTGCGCCA GGCCCAGCGG CGCCCTCAGC GGCAGAAGGC AAAAAGGGTG 7 80 

GCCATCCTGG TGACAAGCAT CTTCTTCCTC TGCTGGTCAC CCTACCACAT CGTCATCTTC 84 0 

CTGGACACCC TGGCGAGGCT GAAGGCCGTG GACAATACCT GCAAGCTGAA TGGCTCTCTC 900 

GCCGTGGCCA TCACCATGTG TGAGTTCCTG GGCCTGGCCC ACTGCTGCCT CAACCCCATG 96 0 

CTCTACACTT TCGCCGGCGT GAAGTTCCGC AGTGACCTGT CGCGGCTCCT GACCAAGCTG 102 0 

GGCTGTACCG GCCCTGCCTC CCTGTGCCAG CTCTTCCCTA GCTGGCGCAG GAGCAGTCTC 108 0 

TCTGAGTCAG AGAATGCCAC CTCTCTCACC ACGTTCTAG 1119 
(201) INFORMATION FOR SEQ ID NO: 20 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 72 amino acids 

(B) TYPE: ainino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 200: 

Met Asn Tyr Pro Leu Thr Leu Glu Met Asp Leu Glu Asn Leu Glu Asp 
15 10 15 

Leu Phe Trp Glu Leu Asp Arg Leu Asp Asn Tyr Asn Asp Thr Ser Leu 
20 25 30 

Val Glu Asn His Leu Cys Pro Ala Thr Glu Gly Pro Leu Met Ala Ser 
35 40 45 

Phe Lys Ala Val Phe Val Pro Val Ala Tyr Ser Leu He Phe Leu Leu 
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SO 55 60 

Gly Val lie Gly Asn Val Leu Val Leu Val lie Leu Glu Arg His Arg 
70 75 80 

Gin Thr Arg Ser Ser Thr Glu Thr Phe Leu Phe His Leu Ala Val Ala 
85 90 95 

Asp Leu Leu Leu Val Phe He Leu Pro Phe Ala Val Ala Glu Gly Ser 
100 105 110 

Val Gly Trp Val Leu Gly Thr Phe Leu Cys Lys Thr Val He Ala Leu 
115 120 125 

His Lys Val Asn Phe Tyr Cys Ser Ser Leu Leu Leu Ala Cys He Ala 
130 135 140 

val Asp Arg Tyr Leu Ala He Val His Ala Val His Ala Tyr Arg His 
145 150 155 160 

Arg Arg Leu Leu Ser He His He Thr Cys Gly Thr He Trp Leu Val 
^65 170 175 

Gly Phe Leu Leu Ala Leu Pro Glu He Leu Phe Ala Lys Val Ser Gin 
180 185 190 

Gly His His Asn Asn Ser Leu Pro Arg Cys Thr Phe Ser Gin Glu Asn 
195 200 205 

Gin Ala Glu Thr His Ala Trp Phe Thr Ser Arg Phe Leu Tyr His Val 
210 215 220 

Ala Gly Phe Leu Leu Pro Met Leu Val Met Gly Trp Cys Tyr Val Gly 
225 230 235 240 

Val Val His Arg Leu Arg Gla Ala Gin Arg Arg Pro Gin Arg Gin Lys 
245 250 255 

Ala Lys Arg Val Ala He Leu Val Thr Ser He Phe Phe Leu Cys Trp 
260 265 270 

Ser Pro Tyr His He Val He Phe Leu Asp Thr Leu Ala Arg Leu Lys 
275 280 285 

Ala Val Asp Asn Thr Cys Lys Leu Asn Gly Ser Leu Pro Val Ala He 
250 295 300 

Thr Met Cys Glu Phe Leu Gly Leu Ala His Cys Cys Leu Asn Pro Met 
305 310 315 320 

Leu Tyr Thr Phe Ala Gly Val Lys Phe Arg Ser Asp Leu Ser Arg Leu 
325 330 335 



Leu Thr Lys Leu Gly Cys Thr Gly Pro Ala Ser Leu Cys Gin Leu Phe 
340 345 350 
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Pro Ser Trp Arg Arg Ser Ser Leu Ser Glu Ser Glu Asn Ala Thr Ser 
355 360 365 

Levi Thr Thr Phe 
370 

(202) INFORMATION FOR SEQ ID NO: 201: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 112 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 01: 

ATGGATGTGA CTTCCCAAGC CCGGGGCGTG GGCCTGGAGA TGTACCCAGG CACCGCGCAG 6 0 

CCTGCGGCCC CCAACACCAC CTCCCCCGAG CTCAACCTGT CCCACCCGCT CCTGGGCACC 12 0 

GCCCTGGCCA ATGGGACAGG TGAGCTCTCG GAGCACCAGC AGTACGTGAT CGGCCTGTTC 180 

CTCTCGTGCC TCTACACCAT CTTCCTCTTC CCCATCGGCT TTGTGGGCAA CATCCTGATC 24 0 

CTGGTGGTGA ACATCAGCTT CCGCGAGAAG ATGACCATCC CCGACCTGTA CTTCATCAAC 3 00 

CTGGCGGTGG CGGACCTCAT CCTGGTGGCC GACTCCCTCA TTGAGGTGTT CAACCTGCAC 3 60 

GAGCGGTACT ACGACATCGC CGTCCTGTGC ACCTTCATGT CGCTCTTCCT GCAGGTCAAC 420 

ATGTACAGCA GCGTCTTCTT CCTCACCTGG ATGAGCTTCG ACCGCTACAT CGCCCTGGCC 480 

AGGGCCATGC GCTGCAGCCT GTTCCGCACC AAGCACCACG CCCGGCTGAG CTGTGGCCTC 54 0 

ATCTGGATGG CATCCGTGTC AGCCACGCTG GTGCCCTTCA CCGCCGTGCA CCTGCAGCAC 600 

ACCGACGAGG CCTGCTTCTG TTTCGCGGAT GTCCGGGAGG TGCAGTGGCT CGAGGTCACG 66 0 

CTGGGCTTCA TCGTGCCCTT CGCCATCATC GGCCTGTGCT ACTCCCTCAT TGTCCGGGTG 720 

CTGGTCAGGG CGCACCGGCA CCGTGGGCTG CGGCCCCGGC GGCAGAAGGC GAAGCGCATG 78 0 

ATCCTCGCGG TGGTGCTGGT CTTCTTCGTC TGCTGGCTGC CGGAGAACGT CTTCATCAGC 84 0 

GTGCACCTCC TGCAGCGGAC GCAGCCTGGG GCCGCTCCCT GCAAGCAGTC TTTCCGCCAT 90 0 

GCCCACCCCC TCACGGGCCA CATTGTCAAC CTCACCGCCT TCTCCAACAG CTGCCTAAAC 96 0 

CCCCTCATCT ACAGCTTTCT CGGGGAGACC TTCAGGGACA AGCTGAGGCT GTACATTGAG 102 0 

CAGAAAACAA ATTTGCCGGC CCTGAACCGC TTCTGTCACG CTGCCCTGAA GGCCGTCATT 108 0 

CCAGACAGCA CCGAGCAGTC GGATGTGAGG TTCAGCAGTG CCGTGTAG 112B 
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(203) INFORMATION FOR SEQ ID NO -2 02: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 75 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOI^GY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:202: 

Met Asp Val Thr Ser Gin Ala Arg Gly Val Gly Leu Glu Met Tyr Pro 
15 10 15 

G]y Thr Ala Gin Pro Ala Ala Pro Asn Thr Thr Ser Pro Glu Leu Asn 
20 25 30 

Leu Ser His Pro Leu Leu Gly Thr Ala Leu Ala Asn Gly Thr Gly Glu 
35 40 45 

Leu Ser Glu His Gin Gin Tyr Val lie Gly Leu Phc Leu Ser Cys Leu 
50 5S 60 

Tyr Thr He Phe Leu Phe Pro He Gly Phe Val GLy Asn He Leu He 
6 5 70 7b 8 0 

Leu Val Val Asn He Ser Phe Arg Glu T.ys Met Thr He Pro Asp Leu 
85 90 95 

Tyr Phe He Asn Leu Ala Val Ala Asp Leu He Leu Val Ala Asp Ser 
100 105 110 

Leu He Glu Val Phe Asn Leu His Glu Arg Tyr Tyr Asp He Ala Val 
115 120 125 

Leu Cys Thr Phe Met Ser Leu Phe Leu Gin Val Asn Met Tyr Ser Ser 
130 135 140 

Val Phe Phe Leu Thr Trp Met Ser Phe Asp Arg Tyr He Ala Leu Ala 
145 150 155 160 

Arg Ala Met Arg Cys Ser Leu Phe Arg Thr Lys His His Ala Arg Leu 
165 170 175 

ser cys Gly Leu He Trp Met Ala Ser Val Ser Ala Thr Leu Val Pro 
180 185 190 

Phe Thr Ala Val His Leu Gin His Thr Asp Glu Ala Cys Phe Cys Phe 
195 200 205 

Ala Asp Val Arg Glu Val Gin Trp Leu Glu Val Thr Leu Gly Phe He 
210 215 220 



Val Pro Phe Ala He He Gly Leu Cys Tyr Ser Leu He Val Arg Val 
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225 230 235 240 

Leu Val Arg Ala His Arg His Arg Gly Leu Arg Pro Arg Arg Gin Lys 
245 250 255 

Ala Lys Arg Met Tie Leu Ala Val Val Leu Val Phe Phe Val Cys Trp 
260 265 270 

Leu Pro Glu Asn Val Phe lie Ser Val His Leu Leu Gin Arg Thr Gin 
275 280 285 

Pro Gly Ala Ala Pro Cys Lys Gin Ser Phe Arg His Ala His Pro Leu 
290 295 300 

Thr Gly His lie Val Asn Leu Thr Ala Phe Ser Asn Ser Cys Lou Asn 
305 310 315 320 

Pro Leu lie Tyr Ser Phe Leu Gly Glu Thr Phe Arg Asp Lys Leu Arg 
325 330 335 

Leu Tyr lie Glu Gin Lys Thr Asn Leu Pro Ala Leu Asn Arg Phe Cys 
340 345 350 

His Ala Ala Leu Lys Ala Val lie Pro Asp Ser Thr Glu Gin Ser Asp 
355 360 365 

Val Arg Phe Ser Ser Ala Val 

370 375 
(204) INFORMATION FOR SEQ ID NO:203: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1137 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 203: 

ATGGACCTGG GGAAACCAAT GAAAAGCGTG CTGGTGGTGG CTCTCCTTGT CATTTTCCAG 6 0 

GTATGCCTGT GTCAAGATGA GGTCACGGAC GATTACATCG GAGACAACAC CACAGTGGAC 12 0 

TACACTTTGT TCGAGTCTTT GTGCTCCAAG AAGGACGTGC GGAACTTTAA AGCCTGGTTC 180 

CTCCCTATCA TGTACTCCAT CATTTGTTTC GTGGGCCTAC TGGGCAATGG GCTGGTCGTG 24 0 

TTGACCTATA TCTATTTCAA GAGGCTCAAG ACCATGACCG ATACCTACCT GCTCAACCTG 3 00 

GCGGTGGCAG ACATCCTCTT CCTCCTGACC CTTCCCTTCT GGGCCTACAG CGCGGCCAAG 360 

TCCTGGGTCT TCGGTGTCCA CTTTTGCAAG CTCATCTTTG CCATCTACAA GATGAGCTTC 420 



wo 00/22129 



rCT/US99/23938 



159 

TTCAGTGGCA TGCTCCTACT TCTTTGCATC AGCATTGACC GCTACGTGGC CATCGTCCAG 4 80 
GCTGTCTCAG CTCACCGCCA CCGTGCCCGC GTCCTTCTCA TCAGCAAGCT GTCCTGTGTG 54 0 
GGCATCTGGA TACTAGCCAC AGTGCTCTCC ATCCCAGAGC TCCTGTACAG TGACCTCCAG 6 00 
AGGAGCAGCA GTGAGC7U\GC GATGCGATGC TCTCTCATCA CAGAGCATGT GGAGGCCTTT 66 0 
ATCACCATCC AGGTGGCCCA GATGGTGATC GGCTTTCTGG TCCCCCTGCT GGCCATGAGC 72 0 
TTCTGTTACC TTGTCATCAT CCGCACCCTG CTCCAGGCAC GCAACTTTGA GCGCAACAAG 78 0 
GCCAAAAAGG TGATCATCGC TGTGGTCGTG GTCTTCATAG TCTTCCAGCT GCCCTACAAT 64 0 
GGGGTGGTCC TGGCCCAGAC GGTGGCCAAC TTCAACATCA CCAGTAGCAC CTGTGAGCTC 90 0 
AGTAAGCAAC TCAACATCGC CTACGACGTC ACCTACAGCC TGGCCTGCGT CCGCTGCTGC 96 0 

GTCAACCCTT TCTTGTACGC CTTCATCGGC GTCAAGTTCC GCAACGATCT CTTCAAGCTC 102 0 

TTCAAGGACC TGGGCTGCCT CAGCCAGGAG CAGCTCCGGC AGTGGTCTTC CTGTCGGCAC 108 0 

ATCCGGCGCT CCTCCATGAG TGTGGAGGCC GAGACCACCA CCACCTTCTC CCCATAG 113 7 
(205) INFORMATION FOR SEQ ID NO: 204: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 378 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 204: 

Met Asp Leu Gly Lys Pro Met Lys Ser Val Leu Val Val Ala Leu Leu 
15 10 15 

Val Tie Phe Gin Val Cys Leu Cys Gin Asp Glu Val Thr Asp Asp Tyr 
20 25 30 

lie Gly AGp Asn Thr Thr Val Asp Tyr Thr Leu Phe Glu Ser Leu Cys 
35 40 45 

Ser Lys Lys Asp Val Arg Asn Phe Lys Ala Trp Phe Leu Pro lie Met 
50 55 60 

Tyr Ser Tie lie Cys Phe Val Gly Leu Leu Gly Asn Gly Leu Val Val 
65 70 75 80 

Leu Thr Tyr lie Tyr Phe Lys Arg Leu Lys Thr Met Thr Asp Thr Tyr 
85 90 95 

Leu Leu A3n Leu Ala Val Ala Asp lie Leu Phe Leu Leu Thr Leu Pro 
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100 105 110 

Phe Trp Ala Tyr Sor Ala Ala Lys Ser Trp Val Phe Gly Val His Phe 
115 120 125 

Cys Lys Leu lie Phe Ala lie Tyr Lys Met Ser Phe Phe Ser Gly Met 
130 135 140 

Leu Leu Leu Leu Cys lie Ser lie A^p Arg Tyr Val Ala lie Val Gin 
145 150 IBS 160 

Ala Val Ser Ala His Arg His Arg Ala Arg V3.I Leu Leu lie Ser Lys 
165 170 175 

Leu Ser Cys Val Gly He Trp He Leu Ala Thr Val Leu Ser He Pro 
180 185 190 

Glu Leu Leu Tyr Ser Asp Leu Gin Arg Ser Ser Ser Glu Gin Ala Met 
195 200 205 

Arg Cys Ser Leu He Thr Glu Hie Val G2u A.la Phe He Thr He Gin 
210 215 220 

Val Ala Gin Met Val He Gly Phe Leu Val Pro Leu Leu Ala Met Ser 
225 230 235 240 

Phe Cys Tyr Leu Val He He Arg Thr Leu Leu Gin Ala Arg Asn Phe 
245 250 255 

Glu Arg Asn Lys Ala Lys Lys Val He He Ala Val Val Val Val Phe 
260 265 270 

He Val Phe Gin Leu Pro Tyr Asn Gly Val Val Leu Ala Gin Thr Val 
275 280 295 

Ala Asn Phe Asn He Thr Ser Ser Thr Cys Glu Leu Ser Lys Gin Leu 
290 295 300 

Asn He Ala Tyr Asp Val Thr Tyr Ser Leu Ala Cys Val Arg Cys Cys 
305 310 315 320 

Val Asn Pro Phe Leu IVr Ala Phe He Gly Val Lys Phe Arg Asn Asp 
325 330 335 

Leu Phe Lys Leu Phe Lys Asp Leu Gly Cys Leu Ser Gin Glu Gin Leu 
340 345 350 

Arg Gin Trp Ser Ser Cys Arg His He Arg Arg Ser Ser Met Ser Val 
355 360 365 

Glu Ala Glu Thr Thr Thr Thr Phe Ser Pro 
370 375 



(206) INFORMATION FOR SEQ ID NO: 205: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1086 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: UNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 05: 

ATGGATATAC AAATGGCAAA CAATTTTACT CCGCCCTCTG CAACTCCTCA GGGAAATGAC 6 0 

TGTGACCTCT ATGCACATCA CAGCACGGCC AGGATAGTAA TGCCTCTGCA TTACAGCCTC 120 

GTCTTCATCA TTGGGCTCGT GGGAAACTTA CTAGCCTTGG TCGTCATTGT TCAAAACAGG 180 

AAAAAAATCA ACTCTACCAC CCTCTATTCA ACAAATTTGG TGATTTCTGA TATACTTTTT 24 0 

ACCACGGCTT TGCCTACACG AATAGCCTAC TATGCAATGG GCTTTGACTG GAGAATCGGA 3 00 

GATGCCTTGT GTAGGATAAC TGCGCTAGTG TTTTACATCA ACACATATGC AGGTGTGAAC 3 60 

TTTATGACCT GCCTGAGTAT TGACCGCTTC ATTGCTGTGG TGCACCCTCT ACGCTACAAC 420 

AAGATAAAAA GGATTGAACA TGCAAAAGGC GTGTGCAT'AT ITGTCTGGAT TCTAGTATTT 480 

GCTCAGACAC TCCCACTCCT CATCAACCCT ATGTCAAAGC AGGAGGCTGA AAGGATTACA 54 0 

TGCATGGAGT ATCCAAACTT TGAAG/^CT AAATCTCTTC CCTGGATTCT GCTTGGGGCA 6 00 

TGTTTCATAG GATATGTACT TCCACTTATA ATCATTCTCA TCTC^CTATTC TCAGATCTGC 66 0 

TGCAAACTCT TCAGAACTGC CAAACAAAAC CCACTCACTG AGAAATCTGG TGTAAACAAA 72 0 

AAGGCTAAAA ACACAATTAT TCTTATTATT GTTGTGTTTG TTCTCTGTTT CACACCTTAC 78 0 

CATGTTGCAA TTATTCAACA TATGATTAAG AAGCTTCGTT TCTCTAATTT CCTGGAATGT 84 0 

AGCCAAAGAC ATTCGTTCCA GATTTCTCTG CACTTTACAG TATGCCTGAT GAACTTCAAT 900 

TGCTGCATGG ACCCTTTTAT CTACTTCTTT GCATGTAAAG GGTATAAGAG AAAGGTTATG 96 0 

AGGATGCTGA AACGGCAAGT CAGTGTATCG ATTTCTAGTG CTGTGAAGTC AGCCCCTGAA 102 0 

GAAAATTCAC GTGAAATGAC AGAAACGCAG ATGATGATAC ATTCCAAGTC TTCAAATGGA 1080 

AAGTGA 108 6 
{207} INFORMATION FOR SEQ ID NO:206: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 361 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 
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{ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 06: 

Met Asp lie Gin Met Ala Asn Asn Phc Thr Pro Pro Ser Ala Thr Pro 
15 10 15 

Gin Gly Asn Asp Cys Asp Leu Tyr Ala His His Ser Thr Ala Arg He 
20 25 30 

Val Met Pro Leu His Tyr Ser Leu Val Phe He He Gly Leu Val Gly 
35 40 45 

Asn Leu Leu Ala Leu Val Val He Val Gin Asn Arg Lys Lys He Asn 
50 Sb 60 

Ser Thr Thr Leu Tyr Ser Thr Asn Leu Val He Ser Asp He Leu Phe 
65 70 75 80 

Thr Thr Ala Leu Pro Thr Arg He Ala Tyr T^/r Ala Met Gly Phe Asp 

85 90 95 

Trp Arg He Gly Asp Ala Leu Cys Arg He Thr Ala Leu Val Phe Tyr 
100 105 110 

He Asn Thr Tyr Ala Gly Val Asn Phe Met Thr C^-s Leu Ser He Asp 
115 120 125 

Arg Phe He AJ a Val Val His Pro Leu Arg Tyr Asn Lys He Lys Arg 
130 135 140 

He Glu His Ala Lys Gly Val Cys He Phe Val Trp He Leu Val Phe 
145 150 155 160 

Ala Gin Thr Leu Pro Leu Leu He Asn Pro Met Ser Lys Gin Glu Aia 
165 170 175 

Glu Arg He Thr Cys Met Glu Tyr Pro Asn Phe Glu Glu Thr Lys Ser 
180 185 190 

Leu Pro Trp He Leu Leu Gly Ala Cys Phe He Gly Tyr Val Leu Pro 
195 200 205 

Leu He He He Leu lie Cys Tyr Ser Gin He Cys Cys Lys Leu Phe 
210 215 220 

Arg Thr Ala Lys Gin Asn Pro Leu Thr Giu Lys Ser Gly Val Asn Lys 
225 230 235 240 

Lys Ala Lys Asn Thr He He Leu He He Val Val Phe Val Leu Cys 
245 250 255 

Phe Thr Pro Tyr His Val Ala He He Gin His Met He Lys Lys Leu 
260 265 270 
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Arg Phe Ser Asn Phe Leu Glu Cys Ser 
275 280 



Gin Arg His Sor Phe Gin lie 
285 



Ser Leu His Phe Thr Val Cys Leu Met 
290 29S 



Ann Phe Asn Cys Cys Met Asp 
300 



Pro Phe lie Tyr Phe Phe Ala Cys Lys 
305 310 



Gly Tyr Lys Arg Lys Val Met 
315 320 



Arg Met Leu Lys Arg Gin Val Ser Val 
325 



Ser lie Ser Ser Ala Val Lys 




Ser Ala Pro Glu Glu Asn Ser Arg Glu 
340 345 



Met Thr Glu Thr Gin Met Met 
350 



lie His Ser Lys Ser Ser Asn Gly Lys 
355 360 



(208) INFORMATION FOR SEQ ID NO: 207: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1446 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: 

ATGCGGTGGC TGTGGCCCCT GGCTGTCTCT CTTGCTGTGA TTTTGGCTGT GGGGCTAAGC 6 0 

AGGGTCTCTG GGGGTGCCCC CCTGCACCTG GGCAGGCACA GAGCCGAGAC CC AGGAGCAG 12 0 

CAGAGCCGAT CCAAGAGGGG CACCGAGGAT GAGGAGGCCA AGGGCGTGCA GCAGTATGTG 18 0 

CCTGAGGAGT GGGCGGAGTA CCCCCGGCCC ATTCACCCTG CTGGCCTGCA GCCAACCAAG 24 0 

CCCTTGGTGG CCACCAGCCC TAACCCCGAC AAGGATGGGG GCACCCCAGA CAGTGGGCAG 3 00 

GAACTGAGGG GCAATCTGAC AGGGGCACCA GGGCAGAGGC TACAGATCCA GAACCCCCTG 360 

TATCCGGTGA CCGAGAGCTC CTACAGTGCC TATGCCATCA TGCTTCTGGC GCTGGTGGTG 420 

TTTGCGGTGG GCATTGTGGG CAACCTGTCG GTCATGTGCA TCGTGTGGCA CAGCTACTAC 460 

CTGAAGAGCG CCTGGAACTC CATCCTTGCC AGCCTGGCCC TCTGGGATTT TCTGGTCCTC 54 0 

TTTTTCTGCC TCCCTATTGT CATCTTCAAC GAGATCACCA AGCAGAGGCT ACTGGGTGAC 6 00 

GTTTCTTGTC GTGCCGTGCC CTTCATGGAG GTCTCCTCTC TGGGAGTCAC GACTTTCAGC 660 

CTCTGTGCCC TGGGCATTGA CCGCTTCCAC GTGGCCACCA GCACCCTGCC CAAGGTGAGG 72 0 

CCCATCGAGC GGTGCCAATC CATCCTGGCC AAGTTGGCTG TCATCTGGGT GGGCTCCATG 78 0 
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ACGCTGGCTG TGCCTGAGCT CCTGCTGTGG CAGCTGGCAC AGGAGCCTGC CCCCACCATG 94 0 

GGCACCCTGG ACTCATGCAT CATGAAACCC TCAGCCAGCC TGCCCGAGTC CCTGTATTCA 90 0 

CTGGTGATGA CCTACCAGAA CGCCCGCATG TGGTGGTACT TTGGCTGCTA CTTCTGCCTG 96 0 

CCCATCCTCT TCACAGTCAC CTGCCAGCTG GTGACATGGC GGGTGCGAGG CCCTCCAGGG 102C 

AGGAAGTCAG AGTGCAGGGC CAGCAAGCAC GAGCAGTGTG AGAGCCAGCT CAAGAGCACC 108 0 

GTGGTGGGCC TGACCGTGGT CTACGCCTTC TGCACCCTCC CAGAGAACGT CTGCAACATC 114 0 

GTGGTGGCCT ACCTCTCCAC CGAGCTGACC CGCCAGACCC TGGACCTCCT GGGCCTCATC 12 00 

AACCAGTTCT CCACCTTCTT CAAGGGCGCC ATCACCCCAG TGCTGCTCCT TTGCATCTGC 12 6 0 

AGGCCGCTGG GCCAGGCCTT CCTGGACTGC TGCTGCTCCT GCTGCTGTGA GGAGTGCGGC 132 C 

GGGGCTTCGG AGGCCTCTGC TGCCAATGGG TCGGACAACA AGCTCAAGAC CGAGGTGTCC 1380 

TCTTCCATCT ACTTCCACAA GCCCAGGGAG TCACCCCCAC TCCTGCCCCT GGGCACACCT 144 0 

TGCTGA S 
(209) INFORMATION FOR SEQ ID NO: 2 08: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 481 amino acidfi 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 08: 

Met Arg Trp Leu Trp Pro Leu Ala Val Ser Lou Ala Val He Leu Ala 
15 10 15 

Val Gly Leu Ser Arg Val Ser Gly Gly Ala Pro Leu His Leu Gly Arg 
20 25 30 

His Arg Ala Giu Thr Gin Glu Gin Gin Ser Arg Ser Lys Arg Gly Thr 
35 40 45 ' 

Glu Asp Glu Glu Ala Lys Gly Val Gin Gin Tyr Val Pro Glu Glu Trp 

50 55 60 

Ala Glu Tyr Pro Arg Pro lie His Pro Ala Gly Leu Gin Pro Thr Lys 
65 70 75 80 

Pro Leu Val Ala Thr Ser Pro Asn Pro Asp Lys Asp Gly Gly Thr Pro 
85 90 95 

Asp Ser Gly Gin Glu Leu Arg Gly Asn Leu Thr Gly Ala Pro Gly Gin 
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100 105 110 

Arg Leu Gin lie Gin Ann Pro Leu Ty-r Pro Val Thr Glu Ser Ser Tyr 
115 120 125 

Ser Ala Tyr Ala Tie Met Leu Leu Ala Leu Val Val Phe Ala Val Gly 
130 135 140 

lie Val Gly Asn Leu Ser Val Met Cys lie Val Trp His Ser Ty^r Tyr 
145 150 155 160 

Leu Lys Ser Ala Trp Asn Ser He Leu Ala Ser Leu Ala Leu Trp Asp 
165 170 175 

Phe Leu Val Leu Phe Phe Cys Leu Pro He Val He Phe Asn Glu He 
180 185 190 

Thr Lys Gin Arg Leu Leu Gly Asp Val Ser Cys Arg Ala Val Pro Phe 
195 200 205 

Met Glu Val Ser Ser Leu Gly Val Thr Thr Phe Ser Leu Cys Ala Leu 
210 215 220 

Gly He Asp Arg Phe His Val Ala Thi Ser Thr Leu Pro Lys Val Arg 
225 230 235 240 

Pro He Glu Arg Cys Gin Ser He Leu Ala Lys Leu Ala Val He Trp 
24 5 2 50 255 

Val Gly Ser Met Thr Leu Ala Val Pro Glu Leu Leu Leu Trp Gin Leu 
260 265 270 

Ala Gin Glu Pro Ala Pro Thr Met Gly Thr Leu Asp Ser Cys He Met 
275 280 2S5 

Lys Pro Ser Ala Ser Leu Pro Glu Ser Leu Tyr Ser Leu Val Met Thr 
290 295 300 

Tyr Gin Asn Ala Arg Met Trp Trp Tyr Phe Gly Cys Tyr Phe Cys Leu 
305 310 315 320 

Pro He Leu Phe Thr Val Thr Cys Gin Leu Val Thr Trp Arg Val Arg 
325 330 335 

Gly Pro Pro Gly Arg Lys Ser Glu Cys Arg Ala Ser Lys His Glu Gin 
340 345 350 

Cys Glu Ser Gin Leu Lys Ser Thr Val Val Gly Leu Thr Val Val Tyr 
355 360 365 

Ala Phe Cys Thr Leu Pro Glu Asn Val Cys Asn He Val Val Ala Tyr 
370 375 380 



Leu Ser Thr Glu Leu Thr Arg Gin Thr Leu Asp Leu Leu Gly Leu He 
385 390 395 400 
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Asn Gin Phe Ser Thr Phe Phe Lys Gly 
405 



Ala lie Thr Pro Val Leu Leu 
410 415 



Leu Cys lie Cys Arg Pro Leu Gly Gin 
420 425 



Ala Phe Leu Asp Cys Cys Cys 
430 



Cys Cys Cys Cys Glu Glu Cys Gly Gly 
435 440 



Ala Ser Glu Ala Ser Ala Ala 



445 



Asn Gly Ser Asp Asn Lys Leu Lys Thr 
450 455 



Glu Val Ser Ser Ser lie Tyr 
460 



Phe His Lys Pro Arg Glu Ser Pro Pro 
465 470 



Leu Leu Pro Leu Gly Thr Pro 
475 480 



Cys 



(210) INFORMATION FOR SEQ ID NO: 2 09: 

(i) SEQUENCE CliARACTERISTICS : 

(A) LENGTH: 1101 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 09: 

ATGTGGAACG CGACGCCCAG CGAAGAGCCG GGGTTCAACC TCACACTGGC CGACCTGGAC 6 0 

TGGGATGCTT CCCCCGGCAA CGACTCGCTG GGCGACGAGC TGCTGCAGCT CTTCCCCGCG 12 0 

CCGCTGCTGG CGGGCGTCAC AGCCACCTGC GTGGCACTCT TCGTGGTGGG TATCGCTGGC 18 0 

AACCTGCTCA CCATGCTGGT GGTGTCGCGC TTCCGCGAGC TGCGCACCAC CACCAACCTC 24 0 

TACCTGTCCA GCATGGCCTT CTCCGATCTG CTCATCTTCC TCTGCATGCC CCTGGACCTC 3 00 

GTTCGCCTCT GGCAGTACCG GCCCTGGAAC TTCGGCGACC TCCTCTGCAA ACTCTTCCAA 360 

TTCGTCAGTG AGAGCTGCAC CTACGCCACG GTGCTCACCA TCACAGCGCT GAGCGTCGAG 420 

CGCTACTTCG CCATCTGCTT CCCACTCCGG GCCAAGGTGG TGGTCACCAA GGGGCGGGTG 4 80 

AAGCTGGTCA TCTTCGTCAT CTGGGCCGTG GCCTTCTGCA GCGCCGGGCC CATCTTCGTG 54 0 

CTAGTCGGGG TGGAGCACGA GAACGGCACC GACCCTTGGG ACACCAACGA GTGCCGCCCC 600 

ACCGAGTTTG CGGTGCGCTC TGGACTGCTC ACGGTCATGG TGTGGGTGTC CAGCATCTTC 66 0 

TTCTTCCTTC CTGTCTTCTG TCTCACGGTC CTCTACAGTC TCATCGGCAG GAAGCTGTGG 72 0 

CGGAGGAGGC GCGGCGATGC TGTCGTGGGT GCCTCGCTCA GGGACCAGAA CCACAAGCAA 78 0 
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ACCA/^GAAAA TGCTGGCTGT AGTGGTGTTT GCCTTCATCC TCTGCTGGCT CCCCTTCCAC 84 0 

GTAGGGCGAT ATTTATTTTC CAAATCCTTT GAGCCTGGCT CCTTGGACAT TGCTCAGATC 9 00 

AGCCAGTACT GCAACCTCGT GTCCTTTGTC CTCTTCTACC TCAGTGCTGC CATCAACCCC 96 0 

ATTCTGTACA ACATCATGTC CAAGAAGTAC CGGGTGGCAG TGTTCAGACT TCTGGGATTC 102 0 

GAACCCTTCT CCCAGAGAAA GCTCTCCACT CTGAAAGATG AAAGTTCTCG GGCCTGGACA 108 0 

GAATCTAGTA TTAATACATG A 1101 
(211) INFORMATION FOR SEQ ID NO: 210: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 366 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(li) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210: 

Met Trp Asn Ala Thr Pro Ser Glu Glu Pro Gly Phe Asn Leu Thr Leu 
IB 10 15 

Ala Asp Leu Asp Trp Asp Ala Ser Pro Gly Asn Asp Ser Leu Gly Asp 
20 25 30 

Glu Leu Leu Gin Leu Phe Pro Ala Pro Leu Leu Ala Gly Val Thr Ala 
35 40 45 

Thr Cys Val Ala Leu Phe Val Val Gly lie Ala Gly Asn Leu Leu Thr 

50 55 60 

Met Leu Val Val Ser Arg Phe Arg Glu Leu Arg Thr Thr Thr Asn Leu 
65 70 75 80 

Tyr Leu Ser Ser Met Ala Phe Ser A.^p Leu Leu lie Phe Leu Cys Met 
85 90 95 

Pro Leu Asp Leu Val Arg Leu Trp Gin Tyr Arg Pro Trp A:>n Phe Gly 
100 105 110 

Asp Leu Leu Cys Lys Leu Phe Gin Phe Val Ser Glu Ser Cys Thr Tyr 
115 120 125 

Ala Thr Val Leu Thr He Thr Ala Leu Ser Val Glu Arg Tyr Phe Ala 
130 135 140 

He Cys Phe Pro Leu Arg Ala Lys Val Val Val Thr Lys Gly Arg Val 
145 150 155 160 

Lys Leu Val He Phe Val He Trp Ala Val Ala Phe Cys Ser Ala Gly 
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16 5 170 175 

Pro He Phe Val Leu Vnl Cly Val Glu His Glu Asn Gly Thr Anp Pro 
180 185 190 

Trp Asp Thr Asn Glu Cys Arg Pro Thr Glu Phe Ala Val Arq Ser Gly 
195 200 2 05 

Leu L.eu Thr Val Met Val Trp Val Ser Ser He Phe Phe Phe Leu Pro 
210 215 22 0 

Val Phe Cys Leu Thr Val ::.eu Tyr Ser l,eii lie Gly Arg Lys Leu Trp 
225 230 235 240 

Arg Arg Arg Arg G:y Asp Ala Val Val Gly Ala Ser Leu Arg Asp Gin 
24 5 250 255 

Asn His Lys Gin Thr Ly^^ Lys Met Lou Ala Val Val Val Phe Ala Phe 

260 26b 270 

lie Leu Cys Trp Lou Pro Phe His Val Gly Arg Tyr Leu Phe Ser Lys 
275 280 28b 

Ser Phe Glu Pro Gly Ser Leu Glu He Ala C.:n He Ser Gin Tyr Cys 
290 295 300 

Asn Leu Val Ser Phe Val Leu Phe I'yr Leu Ser Ala Ala He Asn Pro 
305 310 31b 32 0 

He Leu Tyr Asn He Met Ser Lys Lys Tyr Arg Val Ala Val Phe Arg 
325 330 335 

Leu Leu Gly Fhe Glu Pro Phe Ser Gin Arg Lys Leu Ger Thr Leu Lys 
340 345 350 

Asp Glu Ser Ser Arg Ala Trp Thr Glu Ser Ser I^e Asn Thr 
355 ?.60 365 

(212) INFORMATION FOR SEQ ID NO: 2 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1842 base pairi3 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECUI.E TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211: 

ATGCGAGCCC CGGGCGCGCT TCTCGCCCGC ATGTCGCGGC TACTGCTTCT GCTACTGCTC 6 0 

AAGGTGTCTG CCTCTTCTGC CCTCGGGGTC GCCCCTGCGT CCAGAAACG/i AACTTGTCTG 12 0 

GGGGAGAGCT GTGCACCTAC AGTGATCCAG CGCCGCGGCA GGGACGCCTG GGGACCGGGA 180 
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AJVTTCTGCAA r;AGA'„^GTTCT GCGAGCCCGA 

CTTGCGGGAC CCTCCTGGGA CCTGCCGGCG 

GGGGCGGAGG CGTCGGCAGC CGGACCCCCG 

AGGTGGAAAG GTGCTCGGGG TCAGGAGCCT 

5 GCCCTCCAGC TCTTCCTTCA GATCTCAGAG 

ATTTCCGGGG GTAGCCAGGA GCAGAGTGTG 

TACTGGCCA^\ GGAGAGCCGG GAAACTCCAG 

GCCAATGGAC TGGCGGGGCA CGAAGGGTGG 

CAGAATGGAT CCTTGGGTGA AG<3AATCCAT 

10 ACGAACCGGC GTGTGAGACT GAAGAACGCC 

GCCTACGCGG TCATGTGTCT GTCCGTGGTG 

GCGGTGATGT GCATGGTGTG CCACAACTAC 

GCCAACCTGG CCTTCTGGGA CTTTCTCATC 

CACGAGCTGA CCAAGAAGTG GCTGCTG3AG 

15 GAGGTCGCCT CTCTGGGAGT CACCACTTTC 

CGTGCTGCCA CCAACGTACA GATGTACTAC 

GCCAAACTTG CTGTTATATG GGTGGGAGCT 

CGCCAGCTGA GCAAGGAGGA TTTGGGGTTT 

ATTAAGATCT CTCCTGATTT ACCAGACACC 

20 GCGAGACTGT GGTGGTATTT TGGCTGTTAC 

TGCTCTCTAG TGACTGCGAG GAA7ATCCGC 

AAACGGCAGA TTCAACTAGA GAGTCAGATG 

TATGGATTTT GCATTATTCC TGAAAATATC 

GGGGTTTCAC AGCAGACAAT GGACCTCCTT 

25 AAGTCCTGTG TCACCCCAGT CCTCCTTTTC 

ATGGAGTGCT GCTGCTGTTG CTGTGAGGAA 

GATGACAATG ACAACGAGTA CACCACGGAA 

CGTGAAATGT CCACTTTTGC TTCTGTCGGA 
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GCACCCAGCG AC^.GAGCAGGG GGCAGCGTTT 24 0 

GCCCCGGGC': GTC-iAGCCGGC TGCAGGCAGA 3 00 

GGACCTCCAA OCAGGCCACC TGGCCCCTGG 3^. 0 

TCTGAAACTT TGGGGAGAGG GAACCCCACG 4:0 

GAGGAAGAGA AGGGTCCCAG AGGCGCTGGC 4fl0 

AAGACAGT.;-: CCGGAGCCAG CGATCTTTTT S4 0 

GCTTCCCAGG ACA^\GGCCCT GTCCAAGACG 6 00 

ACAATTGCAC TGCGGGGCCG GGCGCTGGCC buO 

GAGCCTGGGL- ^'VC'^CJCCCG GGGAAACAG :: '^2 0 

TTCTACrCGG TGAGCGAGGA GTCCTATGGA ^SO 

ATCTTCGGvlA C^":i^:GGATCAT TGGCAACCTG B4 0 

TACATGCGGA G^AJrCTCCAA CTCCCTCTTG 90 0 

atctt.:tt"'t irCcrrrcGCT ggtcatcttc ^-so 

GACTTrT';:::T i^.::aa "^atcgt gccctatata io,'0 

ACCTTA'rG:u 'JTC'TJTGCAT AGACCGCTTT 10^0 

GAAATGAT:C^ SiA/iAVTGTl'C CTCAACAACT 114 0 

CTATTGTT\:j CACirrCAGA AGTTGTTCTC 12-jO 

AGTGGCCGAG OTCCGGCAGA AAGGTGCATT 120 0 

ATCTATGTrC TAGCC :TCAC CTACGACAGT 1320 

TTTTGTTTGC C':ACGGTTTT CACCATCACC 13 ro 

AAAGCAGAGA AAGCCTGTAC CCGAGGGAAT 144 0 

AAGTGTACAG TAGTGGCACT GACCATTTTA 150 0 

TGCAACATTG TTACTGCCTA CATGGCTACA 15f0 

AATATCATCA G:CAGTTCCT TTTGTTCTTT 16 2 0 

TGTCTCTGGA AACCCTTCAG TCGGGCCTTC 16 8 0 

TGCATTCAGA AGTCTTCAAC GGTGACCAGT 174 0 

CTCGAACTCT CGCCTTTCAG TACCATACGC 1800 

ACTCATTGCT GA 184 2 
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(213) INFORMATION FOR SEQ ID NO:212: 

(i) SFQUENCE C:IARACTERISTICS : 

(A) LENGTH: 613 ammo acids 

(B) TYPE: amine acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 12: 

Met Arg Ala Pro Gly Ala Leu Leu Ala Arg Met Ser Arg Leu Leu Leu 
15 10 IS 

Leu Lou Leu Leu Lys Val Ser Ala Ser Ser Ala Leu Gly Val Ala Pro 
20 2b 30 

Ala Ser Arg Asn Glu Thr Cys Leu Gly Glu Ser Cys Ala Pro Thr Val 
35 40 45 

lie Gin Arg Arg Gly Arg Asp Ala Trp Gly Pro Gly Asn Ser Ala Arg 
50 5S 60 

Asp Val Leu Arg Ala Arg Ala Pro Arg Glu Glu Gin Gly Ala Ala Phe 
6 5 7 0 7 5 8 0 

Leu Ala Gly Pro Ser Trp Ar.p Leu Pro Ala Ala Pro Gly Arg Asp Pro 
85 90 95 

Ala Ala Gly Arg Gly Ala Glu Ala Ser Ala A.l a Gly Pro Pro Gly Pro 
100 105 110 

Pro Thr Arg Pro Pro Gly Pro Trp Arg Trp Lys Gly Ala Arg Gly Gin 
115 120 125 

Glu Pro Ser Glu Thr Leu Gly Arg Gly Asn Pro Thr Ala Leu Gin Leu 
130 135 140 

Phe Leu Gin He Ser Glu Glu Glu Glu Lys Gly Pro Arg Gly Ala Gly 
145 150 155 160 

He Ser Gly Arg Ser Gin Glu Gin Ser Val Lys Thr Val Pro Gly Ala 
165 170 175 

Ser Asp Leu Phe Tyr Trp Pro Arg Arg Ala Gly Lys Leu Gin Gly Ser 
IBO 185 190 

His His Lys Pro Leu Ser Lys Thr Ala Asn Gly Leu Ala Gly Kic Glu 

195 200 205 

Gly Trp Thr Tie Ala Leu Pro Gly Arg Ala Leu Ala Gin Aon Gly Ser 
210 215 220 



Leu Gly Glu Gly He His Glu Pro Gly Gly Pro Arg Arg Gly Acn Ser 
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22S 2:^0 235 240 

Thr Asn Arg Arg Val Arg Leu Lyii A^n Pro The Tyr Pro Leu Thr Glii 
245 250 255 

Glu Ser Tyr Gly Ala Tyr Ala Vai Met Cys ueu Ser Val Val lie Phe 
260 265 270 

Gly Thr Gly lie He Gly Asn Le;u Ala Vs 1 Met Cys He Val Cys His 
275 280 285 

Asn Tyr Tyr Met Arg Scr He Ser Asn Ser Leu Leu Ala Asn Leu Ala 
290 29B 300 

Phe Trp Asp Phe Leu He He Phe Phe Cys Leu Pro Leu Vai He Phe 
305 310 315 320 

His Glu Leu Thr Lys Lys Trp Leu Leu Glu Asp Phe Ser Cys Lyr. Ilc^ 
32 5 33 0 3 35 

Val Pro Tyr He Glu Val Ala Ser Leu Gly Val Thr Thr Phe Thr Leu 
340 345 350 

Cys Ala Leu Cys He Asp Arg Phe Arg Ala Ala Thr Acn Val Gin Met 
355 360 365 

Tyr Tyr Glu Met He Glu Asn Cys Ser Ser Thr Thr Ala Lys Leu Ala 
370 375 280 

Vai He Trp Val Gly Ala Leu Leu Leu Ala Leu Pro Glu Val Val Leu 
385 390 395 400 

Arg Gin Leu r,er Lys Glu Asp Leu Gly Phe Ser Gly Arg Ala Pro Ala 
405 410 415 

Glu Arg Cys He He Lys He Ser Pro Asp Leu Pr3 Asp Thr He Tyr 
420 425 430 

Val Leu Ala Leu Thr T^/r Asp Ser Ala Arg Leu Trp Trp Tyr Phe Gly 
435 440 445 

Cys lYr Phe Cys Leu Pro Thr Leu Phe Thr He Thr Cys Ser Leu Val 
450 455 460 

Thr Ala Arg Lys He Arg Lys Ala Glu Lys Ala Cys Thr Arg Gly Asn 
465 470 475 480 

Lys Arg Gin He Gin Leu Glu Ser Gin Met Lys Cys Thr Val Val Ala 
485 490 495 

Leu Thr He Leu Tyr Gly Phe Cys He He Pro Glu Asn He Cys Asn 
500 505 510 

He Val Thr Ala Tyr Met Ala Thr Gly Val Ser Gin Gin Thr Met Asp 
515 52 0 52 5 
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Leu Leu 



530 



Asn lie lie Ser Gin Phe Leu Lou Phe Phe Lys 
53 5 54 0 



SdT Cya Val 



Thr Pro 



545 



Val Leu Leu Phe Cys Lou Cys Lys Pro Phe Ser 
550 555 



Arg Ala Phe 
560 



Met Glu 



Cys Cys Cys Cys C>^s Cys Glu Glu Cys lie Gin 
565 570 



Lys Ser Ser 
575 



Thr Val 



Thr Ser Asp Asp Ar^n Asp Asn Glu Tyr Thr Thr 
580 585 



Glu Leu Glu 
590 



Leu Ser 



Pro Phe Ser Thr lie Arg Arg Glu Met Ser Thr 
595 600 605 



Pho Ala Ser 



Val Gly Thr His Cys 
610 

(214] INFORMATION rOR SEQ ID nO:213: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1248 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{D) TOPOLOGY: linear 

(ii) MOLECULE TYVE : DNA {genomic; 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 11 : 

ATGGTTTTTG CTCACAGAAT GGATAACAGC AAGCCACATT TGA'JTATTCC TACACTTCTG 6 0 

GTGCCCCTCC AAAACCGCAG CTGCACTGAA ACAGCCACAC CTCTGCCAAG CCAATACCTG 12 0 

ATGGAATTAA GTGAGGAGCA CAGTTGGATG AGCAACCAAA CAGACCTTCA CTATGTGCTG 18 0 

AT^CCCGGGG AAGTGGCCAC AGCCAGCATC TTCTTTGGGA TTCTGTGGTT GTTTTCTATC 24 0 

TTCGGCAATT CCCTGGTTTG TTTGGTCATC CATAGGAGTA GGAGGACTCA GTCTACCACC 3 0 0 

AACTACTTTG TGGTCTCCAT GGCATGTGCT GACCTTCTCA TCAGCGTTGC CAGCACGCCT 3b 0 

TTCGTCCTGC TCCAGTTCAC CACTGGAAGG TGGACGCTGG GTAGTGCAAC GTGCAAGGTT 42 0 

GTGCGATATT TTCAATATCT CACTCCAGGT GTCCAGATCT ACGTTCTCCT CTCCATCTGC 4fiO 

ATAGACCGGT TCTACACCAT CGTCTATCCT CTGAGCTTCA AGGTGTCCAG AGAAAAAGCC 54 0 

AAGAAAATGA TTGCGGCATC GTGGATCTTT GATGCAGGCT TTGTGACCCC TGTGCTCTTT 6 00 

TTCTATGGCT CCAACTGGGA CAGTCATTGT AACTATTTCC TCCCCTCCTC TTGGGAAGGC 66 0 

ACTGCCTACA CTGTCATCCA CTTCTTGGTG GGCTTTGTGA TTCCATCTGT CCTCATAATT 72 0 

TTATTTTACC AAAAGGTCAT AAAATATATT TGGAGAATAG GCACAGATGG CCGAACGGTG 7R0 
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AGGAGGACAA TGAACATTGT CCCTCGGACA AAAGTGAAAA CTAAAAAGAT GTTCCTCATT 84 C 

TTAAATCTGT TGTTTTTGCT CTCCTGGCTG GCTTTTCATG TAGCTCAGCT ATGGCACCCC 9 00 

CATGAACAAG ACTATAAGAA ,AfiGTTCCCTT GTTTTCACAG CTATCACATG GATATCCTTT 95 0 

AGTTCTTCAG CCTCTAA/.CG TACTCTGTAT TCAATTTATA ATGCCAATTT TCGGAGAGGG 102 0 

ATGAAAGAGA CTTTTTG':AT GTCCTCTATG AAATGTTACC GAAGCAATGC CTATACTATC 108 0 

ACAACAAGTT CAAGGATGGC I'AAAAAAAAC TACGTTGGGA TTTCAGAAAT CGCTTCCATG 114 0 

GCCAAAACTA TTACCAAAGA CTCGATCTAT GACTCATTTG ACAGAGAAGC CAAGGAAAAA 12 00 

AAGCTTGCTT GGCCCATTvVi. CTCA/i/iTCCA CCAAATACT7 TTGTCTAA 124 8 

(215) INFORMATION FC-R .SEQ ID NO:214: 

( i ) SEQUENCE CHARACTERISTIC.'^ : 

(A) LENGTH: 415 amino acids 

(B) TYPE; ammo acid 

(C) STRAT.TEDNESS ; 

(D) TOPOLOGY: not relevant 

(ii) MOLECULF: TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NG : 1 4 : 

Met Val Phe Ala His Airg Met Asp Acn Scr Lys Pt'o His Leu lie He 
1 5 10 15 

Pro Thr Leu Leu Val Pro Leu Gin Asn Arq S-jr Cyc Thr Glu Thr Ala 
20 25 30 

Thr Pro Leu Pre Ser Gin Tyr Leu r^et Glu Leu £er Glu Glu Hii^ Ser 
35 40 45 

Trp Met Ser Asn GJn Thr Asp Leu His Tyr Val Lou Lys Pro Gly Glu 
5 0 5 5 6 0 

Val Ala Thr Ala Ser He Phe Phe Gly He Leu Trp Leu Phe Ser He 
6 5 7 0 7 5 80 

Phe Gly Asn Ger Leu Val Cys Leu Val He H:s Arg Ser Arg Arg Thr 
85 90 95 

Gin Ser Thr Thr Asn Tyr Phe Val Val Ser Met Ala Cys Ala Asp Leu 
100 105 110 

Leu He Ser Val Ala Ser Thr Pro Phe Val Leu Leu Gin Phe Thr Thr 
115 120 125 

Gly Arg Trp Thr Leu Gly Ser Ala Thr Cys Lys Val Val Arg Tyr Phe 
130 135 140 
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Gin Tyr Leu Thr Pro Gly Val Gin He I'yr Val I_ov. Leu Ser Tic Cys 
150 ibS 160 

He Asp Arg Phe Tyr Thr He Val Tyr Pro Leu Ser Phe Lys Val Ser 
165 170 175 

Arg Glu Lyn Ala Lys Lys Met He Ala Ala Ser Trp He Phe Asp Ala 
180 185 190 

Gly Phe Val Thr Pro Val Leu Phe Phe Tyr Gly Ser Acn Trp Asp Ser 
195 200 205 

His Cys Asn Tyr Phe Leu Pro fJer Ser 1'rp Glu Gly Thr Ala Tyr Thr 

21Q 215 220 

Val He Hi 3 Phe Leu Val Gly Phe Val Tie Pro Ser Val Leu He He 
223 230 235 240 

Leu Phe I'yr Gin Lys Val He Lys Tyr He Trp Aro He Gly Thr Asp 
245 250 255 

Gly Arg Thr Val Arg Arg Thr Met Asn He Val Pro Arg Thr hys Val 
260 265 270 

Lys Thr Lyc Lys Met Phe Leu He Leu Asn Leu Leu Phe Leu Leu Ser 
27 5 280 2B5 

Trp Leu Pro Phe His Val Ala Gin Leu Trp His Pro His Glu Gin Asp 
290 295 300 

Tyr LvG Lys Ser Scr Leu Val Phe Thr Ala He Thr Trp Tie Ser Phe 
30S 310 315 ' 32C 

Ser Ser Ser Ala Ser Lys Pro Thr Leu Tyr Ser He Tyr Asn Ala Asn 
325 330 335 

Phe Arg Arg Gly Ket Lys Glu Thr Phe Cys Met Ser Ser Met Lys Cys 
340 345 350 

Tyr Arg Ser Asn Ala Tyr Thr He Thr Thr Ser Ser Arg Met Ala Lys 
355 360 365 

Lys Asn Tyr Val Gly He Ser Glu He Pro Ser Met Ala Lys Thr He 
370 375 380 

Thr Lys Asp Ser He Tyr Asp Ser Phe Asp Arg Glu Ala Lys Glu Lys 
385 390 395 400 

Lys Leu Ala Trp Pro He Asn Ser Asn Pro Pro Asn Thr Phe Val 
405 410 415 

(216) INFORMATION FOR 5EQ ID NO: 2 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1842 base pairs 



wo 00/22 1 20 PCT/l f S99/23938 

[75 

(u) TYPH : nuclei:: acid 
(C) STRANEiEDNESS ; cinql^ 
(D TOPOLOGY: linear 

(ii) MC>:.ECULE TYPE: DNA (qer.omic) 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 215; 

ATGGGGCCCA CCCTAGCGGT TCCCACCCCC TATGGCTGTA TTGGCTGTAA ^^CTACCCCAG 6 0 

CCAGAATACC CACCGGC^'CT AATCATCTTT ATGTTCTGCG CGATGGTTAT :aCCATCGTT 120 

GTAGACCTiiA TCGGCAACTC CATGGTCATT ^TGGCTGTGA CGAAGAACAA G/vAGCTCCGC 180 

AATTCTGGCA ACATCTTCGT GGTCAGTCTC TCTGTGGCCG ATATGCTGGT GGCCATCTAC 24 c 

CCATACCCTT TGATGCTGCA TGCCATGTCC ATTGGGGGCT GGGATCTGAG OCAGTTACAG 3 00 

TGCCAGAT':1G TCGGGTTCAT CACAGGGCTG A3TGTGGTCG G:TCCATCTT C/iACATCGTT: 3-5 0 

GCAATCGCTA TGAACCGTTA CTGCTACATC T3CCACAGCC T:CA3TACGA ACGGATCTT:: 4 20 

AGTGTGCG'JA ATACCTGCAT CTACCTGCTC ATCACCTGGA TCATGACCGT CCTGGCTGTC 48 0 

CTGCCCAACA TGTACATTGG CACCATCGAvG TACGATCCTC GC/\CC':ACAC CTGCATCTTG S-10 

AACTATCTGA ACAACCCTGT CTTCACTGTT ACCATCGTCT GCATCCACTT CGTCCTCCCT 600 

CTCCTCATCG TGGGTTTCTG CTACGTGAGG ATCTGGACCA AAGTGCTGGC GGCCCGTGAC 6hO 

CCTGCAGGGC AGAATCCTGA C/lACCAACTT GCTGAGGTTC GC7iATAAA:T AACCATGTTT 72 0 

GTGATCTTCC TCCTCTTTGC A3TGTGCTGG TGCCCTATCA A::GT:3CTCAC TGTCTTGGTG 73 0 

GCTGTCAGTC CGAAGGAGAT GGCAGl^CAAG ATCCCC/iACT GGCTTTAT::t TGCAGCCTAC 84 0 

TTCATAGCCT ACTTCAACAG CTGCCTCAAC GCTGTGATCT AGGGGCTCCT CAATGAGAAT 90 0 

TTCGGAA'3AG AATACTGGAC CATCTTCCAT GCTATGCGGC ACCCTATCAT ATTCTTCTCT 9GC 

GGCCTCATCA GTGATATTCG TGAG/^TGCAG GAGGCCCGTA CCCTGGCCCG CGCCCGTGCC 102 c 

CATGCTCGCG ACCAAGCTCG TGAACAAGAC CGTGCCCATG CCTGTCCTGC TGTGGAGGAA lORO 

ACCCCGATGA ATGTCCGG/^u^ TGTTCCATTA CCTGGTGATG CTGCAGCTGG CCACCCCGAC 114 0 

CGTGCCTCTG GCCACCCTAA GCCCCATTCC AGATCCTCCT CTGCCTATCG CAAA.TCTGCC 12 00 

TCTACCCACC ACAAGTCTGT CTTTAGCCAC TCCAAGGCTG CCTCTGGTCA CCTCAAGCCT 126 0 

GTCTCTG<3CC ACTCCAAGCC TGCCTCTGGT CACCCCAAGT CTGCCACTGT CTACCCTAAG 13 2 0 

CCTGCCTCTG TCCATTTCAA GGCTGACTCT GTCCATTTCA AGGGTGACTC TGTCCATTTC 13 8 0 

AAGCCTGACT CTGTTCATTT CAAGCCTGCT TCCAGCA^i^CC rC/J^.GCCCAT CACTGGCCAC 144 1} 
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CATGTCTCTG CTGGCAGCCA CTCCAAGTCT GCCTTCAATC, CTGCCACCAG CCACCCTAAA 3 50 0 

CCCATCAAGC CACCTACCAG CCATGCTGAG CCCACCACTG CTGAGTATGC CAAGCCTGCC 156 0 

ACTACCAGCC ACCCTAAGCC CGCTGCTGCT GACAACCCTG AGCTCTCTGC CTCCCATTGC 16 2 0 

CCCGAGATCC CTGCCATTGC CCACCCTGTG TCTGACGACA GTGACCTCCC TGAGTCGGCC 166 0 

TCTAGCCCTG CCGCTGGGCC CACCAAGCCT GCTGCCAGCC AGCTC^GAGTC TGACACCATC 174 0 

GCTGACCTTC CTGACCCT7iC TGTAGTCACT ACCAGTACCA ATGATTACCA TGATGTCGTG 1800 

GTTGTTGATG TTGAAf^ATGA TCCTGATGA.A ATGGCTGTGT GA 1841: 
(217) INFORMATION FOR SEQ ID NO: 21 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 613 ammo acids 

(B) TYPE: ammo acid 

(C) STKArroEDNESS : 

(D) TOPOLOGY: not relevant 

(li) KOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ TD NO:216: 

Met Gly Pro Thr Leu Ala Val Pro Thr Pre Tyr Gly Cys lie Gly Cys 
1 ^ 10 ^ 15 

Lys Leu Pro Gin Pro Glu Tyr Pro Pro Ala Leu lie Tie Phe Met Phe 
20 25 30 

Cys Ala Met Val He Thr He Val Val Asp Leu He Gly Asn Ser Met 
35 4C 45 

Val He Leu Ala Val Thr Lys Asn Lys Lys Lou Arg Asn Ser Gly Asn 
50 55 60 

He Phe Val Val Ser Leu Ser Val Ala Asp Met Leu Val Ala He Tyr 
65 70 75 80 

Pro Tyr Pro Leu Met Leu His Ala Met Ser He Gly Gly Trp Asp Leu 

85 90 95 

Ser Gin Leu Gin Cys Gin Met Val Gly Phe He Thr Gly Leu Ser Val 
100 105 110 

Val Gly Ser He Phe Asn He Val Ala He Ala He Asn Arg Tyr Cys 
115 120 125 

Tyr He Cys His Ser Leu Gin Tyr Glu Arg He Phe Ser Val Arg Asn 
130 135 140 

Thr Cys He Tyr Leu Val He Thr Trp He Met Thr Val Leu Ala Val 
145 150 155 160 
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Leu Pro A^n Met Tyr Tie Gly Thr lie Glu Tyr Anp Pr :j Arg Thr Tyr 
165 170 175 

Thr Cys Tie Phe Asn Tyr Leu A^n Asn Pro Val Phe Th:: Val Thr Tie 
180 13^: 190 

Val ty/n lie Hi £5 Phe Val Leu Pro Leu Leu lie Val Gly Phe Cys Tyr 
195 200 2CS 

Val Arg lie Trp Thr Lye Val Leu Ala Ala Arg Asp Pro Ala Giy Gin 
210 215 220 

Asn Pro Asp Asn Gin Leu Ala Glu Val Arg Asn Lys Leu Thr Met Phe 
225 23 0 23 S 24 0 

Val lie Phe Leu Leu Phe Ala Vai Cys Trp Cys Pro lie Asn Val Leu 
245 25C 255 

Thr Val Leu Val Ala Val Ser Pro Lys Giu Met Ala Gly Lys He Pro 
260 265 270 

Asn Trp Leu Tyr I,eu Ala Ala Tyr Phe He Ala Tyr Phe Asn Ser Cys 
275 280 2SS 

Leu Asn Ala Val He T^/r Gly Leu T,ou Asn Glu Asn Phe Arg Arg Glu 
2 90 2 95 3 0 0 

Tyr Trp Thr He Phe His Ala Met Aiq His Pro He He Phe Phe Ser 
305 310 315 320 

Gly Leu He Ser Asp He Arg Glu Met Gl.:\ Glu Ala Arg Thr Leu Ala 
325 330 335 

Arg Ala Arg Ala His Ala Arg Asp Gl:i Ala Arg Glu Gin Asp Arg Ala 
34 0 34 5 3 50 

His Ala Cys Pro Ala Val Glu Glu Thr Pro Met Asn Val Arg Asn Val 
355 360 365 

Pro Leu Pro Gly Asp Ala Ala Ala Gly His Pro Asp Arg Ala Ser Gly 
370 375 380 

His Pro Lys Pro His Ser Arg Ser Ser Ser Ala Tyr Arg Lys Ser Ala 
385 390 395 ' 400 

Ser Thr His His Lys Ser Val Phe Ser His Ser Lys Ala Ala Ser Gly 
405 410 415 

Kis Leu Lys Pro Val Scr Gly His Scr Lys Pro Ala Ser Gly His Pro 
420 425 43C 

Lys Ser Ala Thr Val Tyr Pro Lys Pro Ala Ser Val His Phe Lys Ala 
435 440 445 



Asp Ser Val His Phe I-ys Gly Asp Ser Val His Phe Lys Pro Asp Ser 
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450 455 460 

Val }lis Phe Lyr; Pro Ala Ser Ser Asn Pro Lys Pro lie Thr Gly Kis 
465 470 475 4bo 

His Val Ser 7ila Gly Ser His Sor Lys Ser Ala Phe Asn Ala Ala Thr 
485 ^90 49 5 

Ser His Pro Lys Pro He Lys Pro Ala Thr Ser HiG Ala Glu Pro Thr 
500 505 510 

Thr Ala Asp Tyr Pro Lys Pro Ala Thr Thr Ser His Pro Lys Pro Ala 
BIB 520 525 

Ala Ala Asp Asn Pro Glu Leu Ser Ala Ser Hin Cys Pro Glu 1]& Pro 
530 535 540 

Ala He Ala His Pro Val Ser Asp Asp Ser Asp Leu Pro Glu Ser Ala 
545 550 555 560 

Sei. Ser Pro Ala Ala Gly Pro Thr Lys Pre Ala Ala Ser Gin l.eu Glu 
565 570 575 

Ser Asp Thr He Ala Asp Leu Pro Asp Pro Thr Val Va] Thr Thr Ser 
580 585 590 

Thr Asii Asp Tyr His Asp Val Val Val Val Asp Va.; Glu Asp Asp Pro 
595 600 60 5 

Asp Glu Met Ala Val 
610 

(218) INFORMATION FOR SEC ID NO: 2 17: 

(i) SEOLTENCE CIIAPACTERISTICS : 

(A) LENGTH: 18 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRAJJDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 7: 

ATGGGGCCCA CCCTAGCGGT TCCCACCCCC TATGGCTGTA TTGGCTGTAA GCTACCCCAG 6 0 

CCAGAATACC CACCGGCTCT AATCATCTTT ATGTTCTGCG CGATGGTTAT CACCATCGTT 12 0 

GTAGACCTAA TCGGCAACTC CATGGTCATT TTGGCTGTGA CGAAG.^^CAA GAAGCTCCGi^ 18 0 

AATTCTGGCA ACATCTTCGT GGTCAGTCTC TCTGTGGCCG ATATGCTGGT GGCCATCTAC 240 

CCATACCCTT TGATGCTGCA TGCCATGTCC ATTGGGGGCT GGGATCTGAG CCAGTTACAG 3 00 

TGCCAGATGG TCGGGTTCAT CACAGGGCTG AGTGTGGTCG GCTCCATCTT CAACATCGTG 3 60 
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GCAATCGCTA TCAACCGTTA CTGCTACATC TGCCA CAOCC TCCAGTACGA ACGGATCTTC 12 0 

AGTGTGCGCA ATACCTGCAT CTACCTGGTC ATCACCTGGA TCATGACCGT GCTGGCTGTC 40 0 

CTGCCCAACA TGTACATTGG CACCATCGAG TACGATCCTC GCACCTACAC CTGCATCTTC S4 0 

AACTATCTGA ACAACCCTGT CTTCACTGTT ACCATCGTCT GCATCCACTT CGTCCTCCCT 60 0 

CTCCTCATC:^ TGGGTTTCTG CTACGTGAGG ATCTGGACCA .^^GTGCTGGC GGCCCGTGAC 66 0 

COTGCAGGG? AGAATCGTCA CAACCAACTT GCTGAGGTTC GCAATA7VACT AACCATGTTT 720 

GTGATCTTC? TCCTCTTTGC AGTGTGCTGG TGCCCTATCA ACGTGGTCAC TGTCTTGGTG '^'^G 

GCTGTCAGTC CGAAGGAGAT GGCAGGCAAG ATCCCCTiACr GGCTTTATCT TGCAGCCTAC 8 1 0 

TTCATAGCCT ACTTCAACAG CTGCCTCAAC GCTGTGATCT ACGGGCTCCT CAATGAGAAT !.>0 0 

TTCCGAAGAG AATACTGGAC CATCTTCCAT GCTATGCGGC ACCCTATCAT ATTCTTCTGT 9hC 

GGCCTCATCA GTGATATTCG TGAGATGCAG GAGGCCCGTA CCCTGGCCCG CGCCCGTGCC 10.50 

CATGCTCGCG ACCAAGCTCG TGAACAAGAC GGTGCCCATG CCTGTCCTGC TGTGGAGGA;^. 10 8 0 

ACCCCGATGA ATGTCCGGAA TGTTCCATTA CCTGGTGATG CTGGAGCTGG CCACCCCGAC 114 0 

CGTGCCTCT'.^ GCCAGCCTAA GCCCCATTCC AGATCCTCCT CTG'JCTATCG CAAATCTGCC 12 00 

TCTACCCACC ACAAGTCTGT CTTTAGCCAC TCCAAGGCTG CCTCTGGTCA CCTCAAGCCT 12 uO 

gtctctggcc actccaagcc tgcctctggt caccccaagt ctgccactgt ctaccctaag i3::o 

CCTGCCTCTG TCCATTTCAA GGCTGACTCT GTCCATTTCA AGGGTGACTC TGTCCATTTC 13 8 0 

AAGCCTGACT CTGTTCA.TTT CAACCCTGCT TCCAGCAACC CCAAGCCCAT CACTGGCCAC 14 4 0 

CATGTCTCTG CTGGCAGCCA CTCCAAGTCT GCCTTCAGTG CTGCCACCAG CCACCCTAAA 150C 

CCCACCACTG GCCAGATCAA GCCAGCTACC AGCCATGCTG AGCCCACCAC TGCTGACTAT 15eO 

CCCAAGCCTG CCACTACCAG CCACCCTAAG CCCACTGCTG CTGACAACCC TGAGCTCTCT 162 0 

GCCTCCCATT GCCCCGAGAT CCCTGCCATT GCCCACCCTG TGTCTGACGA CAGTGACCTC 16 ho 

CCTGAGTCGG CCTCTAGCCC TGCCGCTGGG CCCACCAAGC CTGCTGCCAG CCAGCTGGAG 174 0 

TCTGACACCA TCGCTGACCT TCCTGACCCT ACTGTAGTCA CTACCAGTAC CAATGATTAC 18 00 

CATGATGTCG TGGTTGTTGA TGTTGAAGAT GATCCTGATG /iAATGGCTGT GTGA 18114 
(219) INFORMATION FOR SEQ ID NG:21B: 

[i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 617 ammo acids 

(B) TYPE: amino acid 
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(Cl STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE; protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 218: 

Met Giy Pro Thr Leu Ala Val Pro Thr Pro Tyr Gly Cys lie Gly Cys 
15 10 15 

Lys Leu Pro Gin Pro Glu Tyr Pro Pro Ala Leu He He Phe Met Phe 
20 25 30 

Cys Ala Met Val He Thr lie Val Val Asp Leu He Gly Asn Ser Met 
35 40 4b 

Val He Leu Ala Val Thr Lys Asn Lys Lys Leu Arg Asn Ser Gly Asn 

50 55 60 

He Phe Val Val Ser Leu Ser Val Ala Asp Mot Leu Val Ala He Tyr 

65 70 75 80 

Pro Tyr Pro Leu Met Leu His Ala Mot Ser He Gly Gly Trp Asp Leu 
95 90 95 

Ser Gin Leu Gin Cys Gin Met Val Gly Phe He Thr Gly Leu Ser Va2 
100 105 110 

Val Gly Ser He Phe Asn He Val Ala He Ala He Asn Arg Tyr Cys 
115 120 125 

Tyr He Cys His Ser Leu Gin Tyr Giu Arg He Phe Ser Val Arg Asr. 
130 135 140 

Thr Cys He Tyr Lou Val He Thr Trp He Met Thr Val Leu Ala Val 
145 150 155 160 

Leu Pro Asn Met Tyr He Gly Thr He Glu Tyr Asp Pro Arg Thr Tyr 
165 170 175 

Thr Cys He Phe Asn Tyr Leu Asn Asn Pro Val Phe Thr Val Thr He 
180 185 190 

Val Cys He His Phe Val Leu Pro Leu Leu He Val Gly Phe Cys Tyr 
195 200 205 

Val Arg He Trp Thr Lys Val Leu Ala Ala Arg Asp Pro Ala Gly Gin 
210 215 220 

Asn Pro Asp Asn Gin Leu Ala G^u Val Arg Asn Lys Leu Thr Met Phe 
225 230 235 240 

Val He Phe Leu Leu Phe Ala Val Cys Trp Cys Pro He Asn Val Leu 
245 250 255 
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Thr Val Leu Val Ala Val Ser Pro Lys GIu Met Ala Gly Lys He Pro 
260 265 270 

Asn Trp Leu Tyr Leu Ala Ala Tyr Phe He Ala Tyr Phe Asn Scr Cys 
275 280 285 

Leu Asn Ala Val He Tyr Gly Leu Leu Asn Glu Asn Phe Arg Arg Glu 
290 295 300 

Tyr Trp Thr He Phe His Ala Met Arg His Pro He He Phe Phe Ser 
305 310 315 320 

Gly Leu He Ser Asp He Arg Glu Met Gin Glu Ala Arg Thr Leu Ala 
325 330 335 

Arg Ala Arg Ala His Ala Arg Asp Gin Ala Arg Glu Gin Asp Arg Ala 
340 345 350 

His Ala Cys Pro Ala Val Glu Glu Thr Pro Met Asn Val Arg Asn Val 

355 360 365 

Pro Leu Pro Gly Asp Ala Ala Ala Gly Hig Pro Ar>p Arg Ala Ser Gly 
370 375 380 

His Pro Lys Pro His £er Arg Ser Ser Ser Ala Tyr Arg Lys Ser Ala 
385 390 395 400 

Ser Thr His His Lys Ser Val Phe Scr His Scr Lys Ala Ala Ser Gly 
405 'HO 415 

His Leu Lys Pro Val Ser Gly His Ser Lys Pxc Ala Ser Gly His Pro 
420 425 430 

Lys Ser Ala Thr Val Tyr Pro Ly^s Pro Ala Ser Val His Phe Lys Ala 
435 440 445 

Asp Ser Val His Phe Lys Gly Asp Ser Val His Phe Lys Pro Asp Ser 
450 455 460 

Val His Phe Lys Pro Ala Ser Ser Asn Pro Lys Pro He Thr Gly His 
465 470 475 480 

His Val Ser Ala Gly Ser His Ser Lys Ser Ala Phe Ser Ala Ala Thr 
485 490 495 

Ser His Pro Lys Pro Thr Thr Gly His He Lys Pro Ala Thr Ser His 
500 505 510 

Ala Glu Pro Thr Thr Ala Asp Tyr Pro Lys Pro Ala Thr Thr Ser His 
515 520 525 

Pro Lys Pro Thr Ala Ala Asp Asn Pro Glu Leu Ser Ala Ser Hie Cys 
530 535 540 

Pro Glu He Pro Ala He Ala His Pro Val Ser Asp Asp Ser Asp Leu 
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545 



550 



555 



560 



Pro Glu Ser Ala Ser Ser X^ro Ala Ala 
565 



Gly Pro Thr Lyr> Pro Ala Ala 
570 575 



Ser Gin Leu Glu Ser Asp Thr lie Ala 
580 565 



Asp Leu Pro Asp Pro Thr Val 
590 



Val Thr Thr Ser Thr Asn Asp Tyr His 
595 600 



Asp Val Val Val Val Asp Val 
605 



Glu Asp Asp Pro Asp Glu Met Ala Val 
610 615 



(220) INFORMATION FOR SEQ ID NO: 219: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1548 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 9: 

ATGGGACATA ACGGGAGCTG GATCTCTCCA AATGCCAGCG AGCCGCACAA CGCGTCCGGC 6 0 

GCCGAGGCTG CGGGTGTGAA CCGCAGCGCG CTCGGGGAGT TCGGCGAGGC GCAGCTGTAC 120 

CGCCAGTTCA CCACCACCGT GCAGGTCGTC ATCTTCATAG GCTCGCTGCT CGGAAACTTC IBO 

ATGGTGTTAT GGTCAACTTG CCGCACAACC GTGTTCAAAT CTGTCACCA/^ CAGGTTCATT 24 0 

AAAAACCTGG CCTGCTCGGG GATTTGTGCC AGCCTGGTCT GTGTGCCCTT CGACATCATC 3 00 

CTCAGCACCA GTCCTCACTG TTGCTGGTGG ATCTACACCA TGCTCTTCTG CAAGGTCGTC 36 0 

AAATTTTTGC ACAAAGTATT CTGCTCTGTG ACCATCCTCA GCTTCCCTGC TATTGCTTTG 42 0 

GACAGGTACT ACTCAGTCCT CTATCCACTG GAGAGGAAAA TATCTGATGC CAAGTCCCGT 480 

GAACTGGTGA TGTACATCTG GGCCCATGCA GTGGTGGCCA GTGTCCCTGT GTTTGCAGTA 54 0 

ACCAATGTGG CTGACATCTA TGCCACGTCC ACCTGCACGG AAGTCTGGAG CAACTCCTTG 6 00 

GGCCACCTGG TGTACGTTCT GGTGTATAAC ATCACCACGG TCATTGTGCC TGTGGTGGTG 66 0 

GTGTTCCTCT TCTTGATACT GATCCGACGG GCCCTGAGTG CCAGCCAGAA GAAGAAGGTC 72 0 

ATCATAGCAG CGCTCCGGAC CCCACAGAAC ACCATCTCTA TTCCCTATGC CTCCCAGCGG 78 0 

GAGGCCGAGC TGAAAGCCAC CCTGCTCTCC ATGGTGATGG TCTTCATCTT GTGTAGCGTG 84 0 

CCCTATGCCA CCCTGGTCGT CTACCAGACT GTGCTCAATG TCCCTGACAC TTCCGTCTTC 900 
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TTGCTGCTCA CTGCTGTTTG GCTGCCCAAA GTCTCCCTGC TGGCAAACCC TGTTCTCTTT 96 0 

CTTACTGTGA ACA^^J^TCTGT GCGCAAGTGC TTGATAGGGA CCCTGGTGCA ACTACACCAC 102 0 

CGGTACAGTC GCCGTAATGT GGTCAGTACA GGGAGTGGCA TGGCTGAGGC CAGCCTGGAA 108 0 

CCCAGCATAC GCTCGGGTAG CCAGCTCCTG GAGATGTTCC ACATTGGGCA GCAGCAGATC 114 0 

TTTAAGCCCA GAGAGGATGA GGAAGAGAGT GAGGCCAAGT ACATTGGCTC AGCTGACTTC 120 0 

CAGGCCAAGG AGATATTTAG GACCTGCCTG GAGGGAGAGC AGGGGCCACA GTTTGCGCCC 126 0 

TCTGCCCCAC CCCTGAGCAG AGTGGACTCT GTATCCCAGG TGGCACCGGC AGCCCCTGTG 132 0 

GAACCTGAAA CATTCCCTGA TAAGTATTCC CTGCAGTITG GCTTTGGGCC TTTTGAGTTG 13 8 0 

CCTCCTCAGT GGCTCTCAGA GACCCGAAAC AGCAAGAAGC GGCTGCTTCC CCGCTTGGGC 14 4 0 

AACACCCCAG AAGAGCTGAT CCAGACAAAG GTGCCCAAGG TAGGCAGGGT GGAGCGGAAG 15 0 0 

ATGAGCAGAA ACAATAAAGT GAGCATTTTT CCAAAGGTGG ATTCCTAG 154 8 
C221) INFORMATION FOR SEQ ID NO: 220: 

{i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 515 amino acids 

(B) TYPE: amine acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: 

Met Gly His Asn Gly Ser Trp lie Ser Pro Asn Ala Ser Glu Pro His 
IS 10 15 

Asn Ala Ser Gly Ala Glu Ala Ala Gly Val Asn Arg Ser Ala Leu Gly 

20 25 30 

Glu Phe Gly Glu Ala Gin Leu Tyr Arg Gin Phe Thr Thr Thr Val Gin 
35 40 45 

Val Val lie Phe lie Gly Ser Leu Leu Gly Asn Phe Met Val Leu Trp 
50 55 60 

Ser Thr Cys Arg Thr Thr Val Phe Lys Ser Val Thr Asn Arg Phe He 
65 70 75 80 

Lys Asn Leu Ala Cys Ser Gly He Cys Ala Ser Leu Val Cys Val Pro 
85 90 95 

Phe Asp He He Leu Ser Thr Ser Pro His Cys Cys Trp Trp He Tyr 
100 105 110 
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Thr Mot Leu Phe Cys Lys Yal Vai Lyys Phe r,eu ?Iis Lys Val Phc Cys 
115 120 12^ 

Ser Val Thr He Leu Ser Phe Pro A"^a Tie Ala Leu Asp Arg Tyr Tyr 
130 135 140 

Ser Val Leu Tyr Pro Leu Glu Arg Lys lie Ser Asp Ala Lys Ser Arg 
145 150 155 160 

Glu Leu Val Met Tyr lie Trp Ala Hi.q Ala Val Val Ala Ser Val Pro 
165 170 175 

Val Phe Ala Val Thr Asn Val Ala Asp He Tyr Ala Thr Ser Thr Cys 
180 IBS 190 

Thr Glu Val Trp Ser Asn Ser Leu Gly His Leu Val Tyr Val Leu Val 
lyS 200 205 

Tyr Asn He Thr Thr Val lie Val Pro Val Val Val Val Phe Leu Phe 

210 215 22 C 

Leu He Leu He Arg Arg Ala Leu Ser Ala Scr Gin Lys Lys Lys Val 
225 230 235 240 

He He Ala Ala Leu Arg Thr Pro Gin Asn 1 hi lie Scr He Pro Tyr 
245 2S0 255 

Ala Ser Gin Arg Glu Ala Glu Leu Lvs Ala Thr T^eu Leu Ser Met Va.l 
260 2G5 270 

Met Val Phe He Leu Cys Ser Val Pro Tyr Ala Thr Leu Val Val Tyr 
275 230 2G5 

Gin Thr Val Leu Asn Val Pro Asp Thr Ser Val Phe Leu Leu Leu Thr 
290 295 30O 

Ala Val Trp Leu Pro Lys Val Ser Leu Leu Ala Asn Pro Val Leu Phe 
305 310 315 320 

Leu Thr Val Asn Lys Ser Val Arg Lyr> Cys Leu lie Gly Thr Leu Val 
325 330 335 

Gin Leu His His Arg Tyr Ser Arg Arg Asn Val Val Ser Thr Gly Ser 
340 345 350 

Gly Met Ala Glu Ala Ser Leu 3lu Pro Ser He Arg Ser Gly Ser Gin 
355 360 36 5 

Leu Leu Glu Met Phe His He Gly Gin Gin Gin He Phe Lys Pro Thr 
370 375 380 

Glu Asp Glu Glu Glu Ser Glu Ala Lys Tyr He Gly Ser Ala Asp Phe 
385 390 395 400 



Gin Ala Lys Glu He Phe Ser Thr Cys Leu Glu Gly Glu Gin Gly Pro 



wo 00/22129 PCT/US99/23938 

185 

405 410 415 

Gin Phe Ala Pro Ser Ala Pro Pro Leu Ser Thr Vai Asp Ser Val Ser 
420 425 430 

Gin Val Ala Pro Ala Ala Pro Val Glu Pro Glva Thr Phe Pro Asp Lys 
435 440 445 

Tyr Ser Leu Gin Phe Gly Phe Gly Pro Phe Glu Leu Pro Pro Gin Trp 
450 455 460 

Leu Ser Glu Thr Arg Asn Ser Lys Lys Arg Leu Leu Pro Pro Leu Gly 
465 470 475 4B0 

Asn Thr Pro Glu Glu Leu lie Gin Thr Lys Val Pro Lys Val Gly Arg 
485 490 495 

Val Glu Arg Lys Met Scr Arg Asn Asn Lyc Val Ser lie Phe Pro Lys 
500 505 510 

Val Asp Ser 
515 

(222) INFORMATION FOR SEQ ID NO: 221: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1164 base pairs 

(B) TYPE: nucleic acid 

(C) STRAl^HDEDNESS : single 

(D) TOPOLOGY: linear 

{xi} MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID ^0:221: 

ATGAATCGGC ACCATCTGCA GGATCACTTT CTGGAAATAG ACAP.G.'^^AGAA CTGCTGTGTG 6 0 

TTCCGAGATG ACTTCATTGC CAAGGTGTTG CCGCCGGTGT TGGGGCTGGA GTTTATCTTT 12 0 

GGGCTTCTGG GCAATGGCCT TGCCCTGTGG ATTTTCTGTT TCCACCTCAA GTCCTGGAAA 18 0 

TCCAGCCGGA TTTTCCTGTT CAACCTGGCA GTAGCTGACT TTCTACTGAT CATCTGCCTG 24 0 

CCGTTCGTGA TGGACTACTA TGTGCGGCGT TCAGACTGGA AGTTTGGGGA CATCCCTTGC 300 

CGGCTGGTGC TCTTCATGTT TGCCATGAAC CGCCAGGGCA GCATCATCTT CCTCACGGTG 360 

GTGGCGGTAG ACAGGTATTT CCGGGTGGTC CATCCCCACC ACGCCCTGAA CAAGATCTCC 42 0 

AATTGGACAG CAGCCATCAT CTCTTGCCTT CTGTGGGGCA TCACTGTTGG CCTAACAGTC 480 

CACCTCCTGA AGAAGAAGTT GCTGATCCAG AATGGCCCTG CAAATGTGTG CATCAGCTTC 54 0 

AGCATCTGCC ATACCTTCCG GTGGCACGAA GCTATGTTCC TCCTGGAGTT CCTCCTGCCC 600 
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CTGGGCATCA TCCTGTTCTG CTCAGCCAGA ATTATCTGGA GCCTGCG3rA GAGACAAATG 660 

GACCGGCATG CCAAGATCAA GAGAGCCAAA ACCTTCATCA TGGTGGTGGC CATCGTCTTT 72 0 

GTCATCTGCT TCCTTCCCAG CGTGGTTGTG CGGATCCGCA TCTTCTGGCT CCTGCACACT 780 

TCGGGCACGC AGAATTGTGA AGTGTACCGC TCGGTGGACC TGGCGTTC'IT TATCACTCTC 84 0 

AGCTTCACCT ACATGAACAG CATGCTGGAC CCCGTGGTGT ACTACTTCTC CAGCCCATCC 90 0 

TTTCCCAACT TCTTCTCCAC TTTGATCAAC CGCTGCCTCC AGAGGAAGAT GACAGGTGAG 96 0 

CCAGATAATA ACCGCAGCAC GAGCGTCGAG CTCACAGGGG ACCCCAACAA AACCAGAGGC 102 0 

GCTCCAGAGG CGTTAATGGC CAACTCCGGT GAGCCATGGA GCCCCTCTTA TCTGGGCCCA 108 0 

ACCTCAAATA ACCATTCCAA GAAGGGACAT TGTCACCAAG AACCAGCATC TCTGGAGAAA 114 0 

CAGTTGGGCT GTTGCATCGA GTAA 116 4 
(223) INFORMATION FOR SEQ ID NO: 222: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 387 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE; protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 22; 

Met Asn Arg His His Leu Gin Asp His Phe Leu Glu lie Asp Lys hys 
IS 10 15 

Asn Cys Cys Val Phe Arg Asp Asp Phc He Ala Lys Val Leu Pro Pro 
20 2b 30 

Val Leu Gly Leu Glu Phe lie Phe Gly Leu Leu Gly Asn Gly Leu Ala 
35 40 45 

Leu Trp He Phe Cys Phe His Leu Lys Ser Trp Lys Ser Ser Arg He 
50 55 60 

Phe Leu Phe Asn Leu Ala Val Ala Asp Phe Leu Leu He He Cys Leu 
55 70 75 80 

Pro Phe Val Met Asp Tyr Tyr Val Arg Arg Ser Asp Trp Lys Phe Gly 
85 90 95 

Asp He Pro Cys Arg Leu Val Leu Phe Met Phe Ala Ket Asn Arg Gin 
100 105 110 

Gly Ser He He Phe Leu Thr Val Val Ala Val Asp Arg Tyr Phe Arg 
115 120 125 
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Val Val His Pro His His Ala Leu Asn I,ys He Ser Asn Trp Thr Ala 
130 135 140 

Ala lie lie Ser Cys Leu Leu Trp Gly lie Thr Val Gly Leu Thr Val 
145 150 155 160 

His Leu Leu Lys Lys Lys Leu Leu He Gin Asn Gly Pro Ala Asn Val 
165 170 175 

Cys He Scr Phe Ser He Cys His Thr Phe Arg Trp His Glu Aia Met 
180 185 190 

Phe Leu Leu Glu Phe Leu Leu Pro Leu Gly TIg He Leu Phe Cys Ser 
195 200 205 

Ala Arg He He Trp Ser Leu Arg Gin Arg Gin Met Asp Arg His Ala 
210 215 220 

LyG He Lys Arg Ala Lys Thr Phe He Met Val Val Ala He Val Phe 
225 230 235 240 

Val He Cys Phe Leu Pro Ser Val Val Val Arg He Arg He Phe Trp 
245 250 255 

I,eu Leu His Thr Scr Gly Thr Gin Asn Cys Glu Val Tyr Arg Ser Val 
260 265 270 

Asp Leu Ala Phe Phe lie Thr Leu Scr Phe Thr Tyr Met Asn Ser Met 
275 230 285 

Leu Asp Pro Val Val Tyr Tyr Phe Ser Ser Pro Ser Phe Pro Asn Phe 
290 295 300 

Phe Ser Thr Leu He Asn Arg Cys Leu Gin Arg Lys Met Thr Gly Glu 
305 310 315 320 

Pro Asp Asn Asn Arg Ser Thr Ser Val Glu Leu Thr Gly Asp Pro P.sn 
325 330 33 5 

Lys Thr Arg Gly Ala Pro Glu Ala Leu Met Ala Asn Ser Gly Glu Pro 
340 345 350 

Trp Ser Pro Ser Tyr Leu Gly Pro Thr Ser Asn Asn His Ser Lys Lys 
355 360 365 

Gly His Cys His Gin Glu Pro Ala Ser Leu Glu Lys Gin Leu Gly Cys 
370 375 380 

Cys He Glu 
385 

(224) INFORMATION FOR SEQ ID NO:223: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1212 base pairs 
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(3) TYl'E : riucle:c acid 
( C) S l-R/JSTDEDN ESS; single 
;d) TOPOLOGY: not relevant 

ill) MOLECULE TYPE: CNA {genomic) 

(xi) SEOUENCE DESCRIPTION: SEQ ID NO: 22 3: 

ATGGCTTGCA ATGGCAGTGC GGCCAGGGGG CACTTTGACC CTGAG-3ACTT GAACCTGACT 6 0 

GACGAGGCAC TGAGACTCAA GTACCTGGGG CCCCAGCA^A CAGAG :TGTT CATGCCCATC 12 0 

TGTGCCACAT ACCTGCTGAT "TTCGTGGTG GGCGCTGT^G GC;iATGGGCT GACCTGTCTG ISO 

GTCATCCTGC GCCACAA.GGC CATGCGCACG CCTACCAArT ACTACCTC^T CAGCCTGGCC 240 

GTGTCGGACC TGCTG:^TGCT GCTGGTGGGC CTGCCCCTGG AGCTCTATGA GATGTGGCAC 3 00 

AACTACCCCT TOCTGCTGGG CGTTGGTGGC TGCTATTTCC GCACGCTACT GTTTGAGATG 36 0 

GTCTGCCTGG CCTCAGTGCT C'AACGTCACT GCCCTGAGCG TGGAACGCTA TGTGGCCGTG 41:0 

GTGCACCCAC TCCAGGCCAG GTCCATGGTG ACGCGGGCCC ATGTGCGCCG AGTGCTTGGG AOV. 

GCCGTCTGGG GTCTTGCCAT I3CTCTGCTCC CTGCCGAACA CCAGCCTGCA CGGCATCCGG 54 0 

CAGCTGCACG TGCCCTGCCG GGGCCCAGTG CCAGACTCAG CTGTTTGCAT GCTGGTCCGC gou 

CCACGGGCCC TCTACAACAT GGTAGTGCAG ACCACCGCGC TG'^TCTTCTT CTGCCTGCCC 66 0 

ATGGCCATCA TGAGCGTGCT CTACCTGCTC ATTGJGCTGC C-ACTGCGGCG GGAGAGGCTG 72 0 

CTGCTCATGC At^AGGCCAA GGGCAGGGGC TCTGCAGCAG CCAGGTCCAG ATACACCTGC vbc 

AGGCTCCAGC AGCACGATCG GGGCCGGAGA CAAGTGAAGA AGATGCTGTT TGTCCTGGTC 84 0 

GTGGTGTTTG GCATCTGCTG GGCCCCGTTC CACGrCGACC O-Ci^TCATGTG GAGCGTCGTG 900 

TCACAGTGGA CAGATGGCCT GCACCTGGCC TTCCAGCACG TGCACGTCAT CTCCGGCATC 96 0 

TTCTTCTA.CC TGGGCTCGGC GGCCAACCCC GTGCTCTATA GCCTCATGTC CAGCCGCTTC 102C 

CGAGAGACCT TCCAGGAGGC CCTGTGCCTC GGGGCCTGCT GCCATCGCCT CAGACCCCGC 108 0 

CACAGCTCCC ACAGCCTCAG CAGGATGACC ACAGGCAGCA CCCTGTGTCA TGTGGGCTCC 114 c 

CTGGGCAGCT GGGTCC/iCCC CCTGGCTGGG AACGATGGCC CAGAGGCGCA GCAAGAGACC 12 0 0 

GATCCATCCT GA 1212 

(22b) INFORMATION FOR SEQ ID NO:224: 

(i) SEQUE1^ICE CHARACTERISTICS: 

{A) LENGTH: -10 3 ammo acidr, 
(B) TYPE: amino acid 
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(C) STRAl^T)EDi^S£ : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 224: 

Met Ala Cys Asn GLy Ser Ala Ala Arg Gly Kis Plie A.sp Pro Glu Anp 
15 10 lb 

Lou Asn Leu Thr Asp Giu Ala Leu Arg Leu Lyn Tyr Leu Gly Pro Gin 

20 25 30 

Gin Thr Glu Leu Phe Met Pro He Cys Ala Thr Tyr Leu Leu lie Phe 
35 40 45 

Val Val Gly Ala Val Gly Asn Gly Lou Thr Cys Leu Val He Leu Arg 

50 55 6C 

Has Lys Ala Met Arg Thr Pro Thr A^;n Tyr Tyr Leu Phe Ser Leu Ala 
6 5 70 7S 8C 

Val Ser Asp Leu Leu Val Leu Leu Val G]y T,ou Pro Leu Glu Leu Tyr 
65 9G 95 

Glu Met Trp Hie A^n Tyr Pro Phe Leu ^eu Gly Val Gly Gly Cys Tyr 
100 ]0n iiu 

Phe Arg Thr Leu Leu Fhe Glu Met Va^. Cyr, Lou Ala Ser Val Leu Asn 
115 120 125 

Val Thr Ala Leu Ser Val Glu Arg Tyr Val Ala Val Val His Pro Leu 
130 135 14C 

Gin Ala Arg Ser Met Val Thr Arg Ala His Val Arg Arg Val Leu Gly 
145 ISC 155 160 

Ala Val Trp Gly Leu Ala Met Leu Cyc Scr Leu Pro Asn Thr Ser Leu 
165 170 175 

His Gly He Arg Gin Leu Kis Val Pro Cys Arg Gly Pro Val Pro Asp 
180 135 190 

Ser Ala Val Cys Met Leu Val Arg Pro Arg Ala Leu Tyr Asn Met Val 
195 200 205 

Val Gin Thr Thr Ala Leu Leu Phe Phe Cys Leu Pro Met Ala lie Met 
210 215 220 

Ser Val Leu Tyr Leu Leu He Gly Leu Arg Leu Arg Arg Glu Arg Leu 
225 230 235 240 

Leu Leu Met Gin Glu Ala Lys Gly Arg Gly Ser Ala Ala Ala Arg Ser 
245 250 255 
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Arq Tyr 'Ihr Cyc Arg Leu Gin Gin His Asp Arg Gly Arfj Arg Gin Va.: 
260 265 270 

Lyc Lys Met Leu Phe Val Leu Val Vnl Val Phe Gly lie Cys Trp Ala 
275 280 285 

Pro Phe His Aln Asp Arg Val Met Trp Ser Val Val Sei Gin Trp Thr 
290 295 300 

Asp Gly Leu Hie; Leu Ala Phe Gin His Val H^n Val He Ser Gly lie 
305 310 315 320 

Phe Phe^ Tyr Leu Gly Ser Ala Ala Asn Pro Val Leu Tyr Ser Leu Met 
325 330 335 

Ser Ser Arg Phe Arg Glu Thr Phe Gin Glu Ala Leu Cys Leu Gly Ala 
340 34b 350 

Cys Cys His Arg Leu Arg Pro Arg His Ser Ser HiG Ser Leu Ser Arg 
355 360 365 

Met Thr Thr Gly Ser Thr Leu Cys Asp Val Gly Ser Leu Gly Ser Trp 
370 375 3 80 

Val HiG Pro Leu Ala Gly Asn Asp Gly Pro Glu Ala Gin Gin Glu Thr 
385 390 3 95 400 

Asp Pro Ser 



(226) INFORMATION FOR SLQ ID NO: 22 5: 

(i) SEQUEKCK CHARACTERISTICS: 
(A) LENGTH: 1099 base pairs 
(3; TYPE: nucleic acid 

;C} STRANDEDICESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULK TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: 5EQ ID NO: 225: 

ATGGGGAACA TCACTGCAGA CAACTCCTCG ATGAGCTGTA CCATCGACCA TACCATCCAC GO 

CAGACGCTGG CCCCGGTGGT CTATGTTACC GTGCTGGTCG TGGGCTTCCC GGCCAACTGC 12 0 

CTGTCCCTCT ACTTCGGCTA CCTGCAGATC .AAGGCCCGGA ACGAGCTGGG CGTGTACCTG 18 0 

TGCAACCTGA CGGTGGCCGA CCTCTTC7AC ATCTGCTCGC TGCCCTTCTG GCTGCAGTAC 24 0 

GTGCTGCAGC ACGACAACTG GTCTCJ^^CGGC GACCTGTCCT GCCAGGTGTG CGGCATCCTC 3 00 

CTGTACGAGA ACATCTACAT CAGCGTGGGC TTCCTCTGCT GCATCTCCGT GGACCGCTAC 36 0 

CTGGCTGTGG CCCATCCCTT CCGCTTCCAC CAGTTCCCGA CCCTGAAGGC GGCCGTCGGC 42 0 
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GTCAGCGTGG TCATCTGGGC CAAGGAGCTG CTGACCAGCA TCTACTTCCT GATGCACGAG 4 80 

GAGGTCATCG AGGACGAGAA CCAGCACCGC GTGTGCTTTG AGCACTACCC CATCCAGGCA 54 0 

TGGCAGCGCG CCATCAACTA CTACCGCTTC CTGGTGGGCT TCCTCTTCCC CATCTGCCTG 600 

CTGCTGGCGT CCTACCAGGG CATCCTGCGC GCCGTGCGCC GGAGCCACGG r/iCCCAGAAG 660 

AGCCGCAA3G ACCAGATCAA GCCGCTGGTG CTCAGCACCG TG 3TCATCTT CCTGGCCTGC 72 0 

TTCCTGCC'^T ACCACGTGTT GCTGCTGGTG CGCAGCGTCT GG ^AGGCCAG CTGCGACTTC 78 0 

GCCAAGGGCG TTTTCAACGC CTACCACTTC TCCCTCCTGC TCACCAGGTT CAACTGCGTC 84 0 

GCCGACCCCG TGCTCTACTG CTTCGTCAGC GAGACCACCC ACCC^GGACCT GGCCCGCCTC 9 00 

CGCGGGGCCT GCCTGGCCTT CCTCACCTGC TCCAGGACCG GCCGGGCCAG GGAGGCCTAC 96 0 

CCGCTGGGTG CCCCCGAGGC CTCCGGGAAA AGCGGGGCCC AGGGTGAGGA GCCCGAGCTC 1C2C 

TTGACCAA::;..": TCCACCCGGC CTTCCAGACC CCTAACTCGC CAi^.GGTCGGG CGGGTTCCCC 1080 

ACGGGCAGGT TGGCCTAG 10 98 
(227) INFORMATION FOR SEQ ID NO: 22 6: 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 365 amino acids 

(B) TYPE: amino acid 
{ C) S TRANDEDNES S : 

(D) TOPOLOGY: not relevant 

( 3 :) MOLECULE TYPE : protein 

[xi] SEQUENCE DESCRIPTION: SEQ ID NO: 22 6. 

Met Gly A£:n He Thr Ala Asp Asn Ser Ser Met Ser Cys Thr lie Asp 
15 10 15 

His Thr lie His Gin Thr Leu Ala Pro Val Val Tyr Val Thr Val Leu 
20 25 30 

Val Val Gly Phe Pro Ala Asn Cys Leu Ser Lcn Tyr Phe Gly Tyr Lgu 
35 40 45 

Gin lie Lys Ala Arg Asn Glu Leu Gly Val Tyr Leu Cys Asn Leu Thr 
50 55 60 

Val A^a Asp lieu Phe Tyr lie Cys Ser Leu Pro Phe Trp Leu Gin Tyr 
65 70 75 80 

Val Leu Gin His Asp Asn Trp Ser His Gly Asp Leu Ser Cys Gin Val 
65 90 95 

Cys Gly Tie Leu Leu Tyr Glu Asn lie Tyr lie Ser Val Gly Phe Leu 
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100 105 ilO 

Cys Cyc lie Ser Va!! Asp Arg Tyr Leu Ala Val Ala His Pre Phe Arc 
115 120 12b 

Phe His Gin Phe Arg Thr Leu Lys Ala Ala Val Gly Val Scr Val Val 
130 135 140 

iIg Trp Ala Lys Glu Leu Leu Thr Ser lie Tyr Phe Leu Met His Glu 
145 150 155 160 

Glu Val lie Glu Asp Glu Asn Gin His Arg Val Cys Phe Glu His Tyr 
165 170 175 

Pro lie Gin Ala Trp Gin Arg Ala lie Asn Tyr Tyr Arg Phe Leu Val 
180 185 190 

Gly Phe Leu Phe Pro He Cys Leu Leu Leu Ala Ser Tyr Gin Gly He 

195 200 2C5 

Lgu Arg Ala Val Arg Arg Ser His Gly Thr Gin Lys Ser Arg Lys Asp 
210 215 220 

Gin lie Lys Arg Leu Val Leu Ser Thr Val Val He Phe Leu Ala Cys 
225 230 235 240 

Phe Leu Pro Tyr His Val Leu Leu Leu Val Arg Ser Val Trp Glu Ala 

245 250 255 

Scr Cys Asp Phe Ala Lys Gly Val Phe Asn Ala lyr His Phe Ser Leu 
260 265 270 

Leu Leu Thr Ser Phe Asn Cy^n Val Ala Asp Pro Val Leu lyr Cys Phe 
275 280 285 

Val Ser Glu Thr Thr His Arg Asp Leu Ala Arg Leu Arg Gly Ala Cys 
290 295 300 

Leu Ala Phe Leu Thr Cys Ser Arg Thr Gly Arg Ala Arg Glu Ala Tyr 
305 310 315 320 

Pro Leu Gly Ala Pro Glu Ala Ser Gly Lys Ser Gly Ala Gin Gly Glu 
325 330 335 

Glu Pro Glu Leu Leu Thr Lys Leu His Pro Ala Phe Gin Thr Pro Asn 
340 345 350 

Ser Pro Gly Ser Gly Gly Phe Pro Thr Gly Arg Leu Ala 
355 360 365 

(228) INFORMATION FOR SEQ ID NO: 227: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1416 base pairs 

(B) TYPE: nucleic acid 



wo 00/22129 PCT/US99/23938 

193 

(C) GTRANDEDNESS ; smglo 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2:^ 7: 

ATGGATATTC TTTGTGAAGA AAATACTTCT TTGAGCTCAA CTACGAACTC CCTAATGCAA 6 0 

TTAAATGATG ACAACAGGCT CTACAGTAAT GACTTTAACT CCGGAGAAGC TAACACTTCT 12 0 

GATGCATTTA ACTGGACAGT CGACTCTGAA AATCGAACCA ACCTTTCCTG TGAAGGGTGC 180 

CTCTCACCGT CGTGTCTCTC CTTACTTCAT CTCCAGGhTJi AAAACTGGTC TGCTTTACTG 240 

ACAGCCGTAG TGATTATTCT AACTATTGCT GGAAACATAC TCGTCATCAT GGCAGTGTCC 100 

CTAGAGAAAA AGCTGCAGAA TGCCACC7VAC TATTTCCTGA TGTCACTTGC CATAGCTGAT i6Q 

ATGCTGCTGG GTTTCCTTGT CATGCCCGTG TCCATGTTAA CCATCCTGTA TGGGTACCGG -12 0 

TGGCCTCTGC CGAGCAAGCT TTGTGCAGTC TGGATTTACC TGGACGTGCT CTTCTCCACG -18 0 

GCCTCCATCA TGCACCTCTG CGCCATCTCG CTGGACCGCT ACGTC3CCAi' CCAGAATCCC 54 0 

ATCCACCACA GCCGCTTCAA CTCCAG7^.CT AAGGCATT^C TGAAAATCAT TGCTGTTTGG 6 00 

ACCATATCAG TAGGTATATC CATGCCAATA CCAGTCTTTG GGCT^CT^GGh CGATTCGAAG 660 

GTCTTTAAGG AGGGGAGTTG CTTACTCGCC GATGATAACT TTGTCGTGAT CGGCTCTTTT ''2 0 

GTGTCATTTT TCATTCCCTT AACCATCATG GTGATCACCT ACTTTCTAAC TATCAAGTCA 78 0 

CTCCAGAAAG AAGCTACTTT GTGTGTAAGT GATCTTGGCA CACGGGCCAA ATTAGCTTCT 84 0 

TTCAGCTTCC TCCCTCAGAG TTCTTTGTCT TCAGAAAAGC TCTTCCAGCG GTCGATCCAT 90 0 

AGGGAGCCAG GGTCCTACAC AGGCAGGAGG ACTATGCAGT CCA^'CAGCAA TGAGCAAAAG 960 

GCAAAGAAGG TGCTGGGCAT CGTCTTCTTC CTGTTTGTGG TGATGTGGTG CCCTTTCTTC 102 0 

ATCACAAACA TCATGGCCGT CATCTGCAAA GAGTCCTGCA ATGAGGATGT CATTGGGGCC 100 0 

CTGCTCAATG TGTTTGTTTG GATCGGTTAT CTCTCTTCAG CAGTCAACCC ACTAGTCTAC 114 0 

ACACTGTTCA ACAAGACCTA TAGGTCAGCC TTTTCACGGT ATATTCAGTG TCAGTACAAG 1100 

GAAAACAAAA AACCATTGCA GTTAATTTTA GTGAACACAA TACCGGCTTT GGCGTACAAG 12G0 

TCTAGCCAAC TTCAAATGGG ACAAAAAAAG AATTCAAAGC AAGATGCCAA GAC7UVCAGAT 132 0 

AATGACTGCT CAATGGTTGC TCTAGGAAAG CAGTATTCTG AAGAGGCTTC TAAAGACAAT 13 BO 

AGCGACGGAG TGAATGAAAA GGTGAGCTGT GTGTGA 1416 
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(22 9) INFORI^TION FOR SEQ ID NO: 228: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 470 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

{D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 8: 

Met Asp rie Leu Cys Glu Glu Asn Thr Ser Leu Ser Ser Thr Thr Asn 
IS 10 15 

Ser Lou Met Gin Leu Asn Asp Asp Agh Arg Leu Tyr Scr Asn Asp Phe 
20 25 30 

Asn Ser Giy Glu Ala Asn Thr Ser Asp Ala Phe Asn Trp Thr Val Asp 
35 40 45 

Ser Glu Asn Arg Thr Asn Leu Ser Cys Glu Gly Cys Leu Ser Pro Ser 
GO 55 60 

Cys Leu Ser Leu Leu His Leu Gin Glu Lys Asn Trp Ser Ala Leu Leu 
65 70 75 80 

Thr Ala Val Val lie lie Leu Thr lie Ala Gly Asn lie Leu Val lie 

85 90 95 

Met Ala Val Ser Leu Glu Lys Lys Leu Gin Asn Ala Thr Asn Tyr Phe 
100 105 110 

Leu Met Ser Leu Ala lie Ala Asp Met Leu Leu Gly Phe Leu Val Met 
115 120 125 

Pro Val Ser Met Leu Thr He Leu Tyr Gly Tyr Arg Trp Pro Leu Pro 
130 135 140 

Ser Lys Leu Cys Ala Val Trp He T^^r Leu Asp Val Leu Phe Ser Thr 
145 150 155 160 

Ala Ser He Met His Leu Cys Ala He Ser Leu Asp Arg Tyr Vai Ala 
165 170 175 

He Gin Asn Pro He His His Ser Arg Phe Asn Ser Arg Thr Lys Ala 
180 185 190 

Phe Leu Lys He He Ala Val Trp Thr He Ser Val Gly He Ser Met 
195 200 205 

Pro He Pro Val Phe Gly Leu Gin Asp Asp Ser Lys Val Phe Lys Glu 
210 215 220 

Gly Ser Cys Leu Leu Ala Asp Asp Asn Phe Val Leu He Gly Ser Phe 
225 230 235 240 
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Val £er Phc Phc lie Pro i^eu Thr lie Met Val lie Thr Tyr Pho Leu 
245 250 25S 

Thr lie Lys Ser Leu Gin Lys Glu Ala Thr Leu Cyfi Val Ser Asp Leu 
260 265 270 

Gly Thr Arg Ala Lys Leu Ala Ser Phe Ser Phe Leu Pro Gin Ser Ser 
275 280 285 

Leu Ser Ser Glu Lyi; Leu Phe Gin Arg Ser lie His Arg Glu Pro Gly 
290 295 300 

Ser Tyr Thr Gly Arg Arg Thr Met Gin Scr He Ser Asn Glu Gin Lys 
305 310 315 320 

Ala Lys Lys Val Leu Gly lie Val Phe Phe Leu Phe Val Val Met Trp 
325 330 335 

Cys Pro Phe Phe lie Thr Asn He Met Ala Val He Cys hys Glu Ser 
340 345 350 

Cys Asn Glu Asp Val He Gly Ala Lau Leu Asn Val Phe Val Trp Tie 
355 360 365 

Gly Tyr Leu Ser Ser Ala Vnl Asn Pro Leu Val Tyr Thr Leu Phe Asn 
370 375 360 

Lys Thr Tyr Arg Ser Ala P}ie Ser Arg Tyr He Gin Cyc Gin Tyr Lys 
385 390 395 ' 400 

Glu Asn Lys Lys Pro Leu Gin Leu He Leu Val Asn Thr He Pro Ala 
405 410 415 

Leu Ala Tyr Lys Ser Ser Gin Leu Gin Met Gly Gin Lys Lys Asn Ser 
420 425 43C 

Lys Gin Asp Ala Lys Thr Thr Asp Asn Asp Cys Ser Met Val Ala Leu 
435 440 445 

Gly Lys Gin Tyr Ser Glu Glu Ala Ser Lys Asp Asn Ser Asp Gly Val 
450 455 460 

Asn Glu Lys Val Ser Cys Val 
465 470 

(23 0) INFORMATION FOR SSQ ID NO: 22 9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 13 7 7 base pairs 

(B) TYPE: nucleic acid 
CC) STRANDEDNESS : single 
(D) TOPOLOGY: Ixnear 



(ii) MOLECULE TYPE: DNA (genorric) 
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(xi) SEOUENCr: DESCRIPTION: SEQ 11: NO: 22 9: 

ATGGTGAACC TGAGGAATGC GGTGCATTCA TTCCTTGTGC ACCTAATTGG CCTATTGGTT 6 0 

TGGCAATGTG ATATTTCTGT GAGCCCAGTA GCAGCTATAG TAACTGACAT TTTCAATACC 12 0 

TCCGATGGTG GACGCTTCA^A ATTCCCAGAC GGGGTACAAA ACTGGCCAGC ACTTTCAATC 180 

GTCATCATAA TAATCATGAC AATAGGTGGC AACATCCTTG TGATCATGGC AGTAAGCATG 24 0 

GAAAAGAAAC TGCACAATGC CACC7vATTAC TTCTTAATGT CCCTAGCCAT TGCTGATATG 3 00 

CTAGTGGGAC TACTTGTCAT GCCCCTGTCT CTCCTGGCAA TCCTTTATGA TTATGTCTGG 3 6C 

CCACTACCTA GATATTTGTG CCCCGTCTGG ATTTCTTTAG ATGTTTTATT TTCAACAGCG 42 0 

TCCATCATGC ACCTCTGCGC TATATCGCTG GATCGGTATG TAGCAATACC] TAATCCTATT 4 80 

GAGCATAGCC GTTTCAATTC GCGGACTAAG GCCATCATGA AGAT7GCTAT TGTTTGGGCA E40 

ATTTCTATAG GTGTATCAGT TCCTATCCCT GTGATTGGAC TGAGGGACGA AGAAAAGGTG 6 0 0 

TTCGTGAACA ACACGACGTG CGTGCTCAAC GACCCAAATT TCGTTCTTAT TGGGTCCTTC 6 60 

GTAGCTTTCT TCATACCGCT GACGATTATG GTGATTACGT AI'TGCCTGAC CATCTACGTT 72 0 

CTGCGCCGAC AAGCTTTGAT GTTACTGCAC GGCCACACCG AGGAACCGCC TGGACTAAGT 780 

CTGGATTTCC TGAAGTGCTG CAAGAGGAAT ACGGCCGAGG AAGAGAACTC TGCAAACCCT 84 0 

AACCAAGACC AGAACGCACG CCGAAGAAAG AAGAAGGAGA GACGTCCTAG GGGCACCATG 90 0 

CAGGCTATCA ACAATGAAAG AAAAGCTAAG AAAGTCCTTG GGATTGTTTT CTTTGTGTTT 96 0 

CTGATCATGT GGTGCCCATT TTTCATTACC AATATTCTGT CTGTTCTTTG TGAGAAGTCC 102 0 

TGTAACCAAA AGCTCATGGA AAAGCTTCTG AATGTGTTTG TTTGGATTGG CTATGTTTGT 10 8 0 

TCAGGAATCA ATCCTCTGGT GTATACTCTG TTCAACAA/iJ^ TTTACCG?i/iG GGCATTCTCC 114 0 

AACTATTTGC GTTGCAATTA TAAGGTAGAG AAAA/iGCCTC CTGTCAGGCA GATTCCAAGA 1200 

GTTGCCGCCA CTGCTTTGTC TGGGAGGGAG CTTAATGTTA ACATTTATCG GCATACCAAT 126 0 

GAACCGGTGA TCGAGAAAGC CAGTGACAAT GAGCCCGGTA TAGAGATGCA AGTTGAGAAT 13 2 0 

TTAGAGTTAC CAGTAAATCC CTCCAGTGTG GTTAGCGAAA GGATTAGCAG TGTGTGA 1377 

(231) INFORMATION FOR SEQ ID NO: 2 30: 

(i) SEQUENCE CHARACTERISTICS: 

fA) LENGTH: 458 amino acids 
(B) TYPE: arrino acid 
{ C ) STRANDEDNESS : 
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(D) TOPOLOGY: not relevant 
{li} MOLECULE TYPE: protein 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 230: 

Met Val Asn Leu Arg Asn Ala Val His Ser Phe Leu Val His Leu He 
15 10 15 

Gly Leu Leu Val Trp Gin Cys Asp He Ser Val Ser Pro Val Ala Ala 

20 2b 30 

lie Val Thr Asp He The Asn Thr Ser Asp Gly Gly Arg Phe Lys Phe 
3S 40 4B 

Pro Asp Gly Val Gin Asn Trp Pro Ala Leu Sor He Val He He He 
50 55 60 

He Met Thr He Gly Gly Asn He Leu Val He Met Ala Val Ser Met 
65 70 75 80 

Giu Lys Lys Leu His Asn Ala Thr Asn Tyr Pl:ie Leu Met Ser Leu Ala 
85 90 95 

He Ala Asp Met Leu Val Gly Leu Leu Val Met Pro Leu Ser Leu Leu 
lOO 105 110 

Ala He Leu Tyr Asp Vyr Val Trp Pr^^ Leu Pro Arg Tyr Leu Cys Pro 
115 120 125 

Val Trp He Ser Leu Asp Val Leu Phe Scr I'hr Ala Ser He Met His 
130 135 140 

Leu Cys Ala He Ser Leu Asp Arg Tyr Val Ala He Arg Asn Pro He 
145 150 155 160 

Glu His Ser Arg Phe Asn Ser Arg Thr Lys Ala lie Met Lys He Ala 
165 170 175 

He Val Trp Ala He Ser He Gly Val Ser Val Pro He Pro Val He 
180 185 190 

Gly Leu Arg Asp Glu Glu Lys Val Phe Val Asn Asn Thr Thr Cys Val 
195 200 205 

Leu Asn Asp Pro Asn Phe Val Leu He Gly Ser Phe Val Ala Phe Phe 
210 215 220 

He Pro Leu Thr He Met Val He Thr Tyr Cys Leu Thr He Tyr Val 
225 230 235 ^ 240 

Leu Arg Arg Gin Ala Leu Met Leu Leu His Gly His Thr Glu Glu Pro 
245 250 255 



Pro Gly Leu Ser Leu Asp Phe Leu Lys Cys Cys Lys Arg Asn Thr Ala 
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260 265 270 

Glu Glu Glu Asn Ser Ala Asn Pro Asn Gin Asp Gin Asn Ala Arg Arg 
275 280 28B 

Arg Lys Lys Lys Glu Arg Arg Pro Arg Gly Thr Met Gin Ala lie Asn 
290 295 300 

Asn Glu Arg Lys Ala Lys Lys Val Leu Gly lie Val Phe Phe Val Phe 
305 310 315 320 

Leu lie Met Trp Cys Pro Phe Phe lie Thr Asn lie Leu Ser Val Leu 
325 330 335 

Cys Glu Lys Ser Cys Asn Gin Lys Leu Met Glu Lys Leu Leu Asn Val 
340 345 350 

Phe Val Trp lie Gly Tyr Val Cys Ser Gly lie Asn Pro Leu Val Tyr 
355 360 365 

Thr Leu Phe Asn Lys lie Tyr Arg Arg Ala Phe Ser Asn Tyr Leu Arg 
370 375 380 

Cys Asn Tyr Lys Val Glu Lys Lys Pro Pro Val Arg Gin lie Pro Arg 
385 390 395 400 

Val Ala Ala Thr Ala Leu Ser Gly Arg Glu Leu Asn Val Asn lie Tyr 
405 410 415 

Arg His Thr Asn Glu Pro Val lie Glu Lys Ala Ser Asp Asn Glu Pro 
420 425 430 

Gly lie Glu Met Gin Val Glu Asn Leu Glu Leu Pro Val Asn Pro Ser 
435 440 445 

Ser Val Val Ser Glu Arg lie Ser Ser Val 
450 455 

(232) INFORMATION FOR SEQ ID NO: 231: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 106 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

ixl) SEQUENCE DESCRIPTION: SEQ ID NO: 231: 

ATGGATCAGT TCCCTGAATC AGTGACAC3AA AACTTTGAGT ACGATGATTT GGCTGAGGCC 6 0 

TGTTATATTG GGGACATCGT GGTCTTTGGG ACTGTGTTCC TGTCCATATT CTACTCCGTC 12 0 

ATCTTTGCCA TTGGCCTGGT GGGAAATTTG TTGGTAGTGT TTGCCCTCAC CAACAGCAAG 180 
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AAGCCCAAGA GTGTCACCGA CATTTACCTC CTGAACCTGG CCTTGTCTGA TCTGCTGTTT 24 0 

GTAGCCACTT TGCCCTTCTG GACTCACTAT TTGATAAATG AAAAGGGCCT CCACAATGCC 3 00 

ATGTGCAA.AT TCACTACCGC CTTCTTCTTC ATCGGCT'ITT TTGGAAGCAT ATTCTTCATC 3 60 

ACCGTCATCA GCATTGATAG GTACCTGGCC ATCGTCCTGG CCGCCAACTC CATGAACAAC 4 20 

CGGACCGTGC AGCATGGCGT CACCATCAGC CTAGGCGTCT GGGCAGCAGC CATTTTGGTG 4 80 

GCAGCACCCC AGTTCATGTT CACAAAGCAG AAAGAAAATG AATGCCTTGG TGACTACCCC 54 C 

GAGGTCCTCC AGGAAATCTG GCCCGTGCTC CGCAATGTGG AAACAAATTT TCTTGGCTTC 6 00 

CTACTCCCCC TGCTCATTAT GAGTTATTGC TACTTCAGAA TCATCCAGAC GCTGTTTTCC 660 

TGCAAGAACC ACAAGAAAGC CAAAGCCAAG AAACTGATCC TTCTGGTGGT CATCGTGTTT 72 0 

TTCCTCTTCT GGACACCCTA CAACGTTATG ATTTTCCTGG AGACGCTT;y\ GCTCTATGAC 78 0 

TTCTTTCCCA GTTGTGACAT GAGGAAGGAT CTGAGGCTGG CCCTCAGTGT GACTGAGACG 840 

GTTGCATTTA GCCATTGTTG CCTGAATCCT CTCATCTATG CATTTGGTGG GGAGAAGTTC 900 

AGAAGATACC TTTACCACGT GTATGGGAAA TGCCTGGCTG TCCTGTGTGG GCGCTCAGTC 96 0 

CACGTTGATT TCTCCTCATC TGAATCACAA AGGAGCAGGC ATGGAAGTGT TCTGAGCAGC 102 0 

AATTTTACTT ACCACACGAG TGATGGAGAT GCATTGCTCC TTCTCTGA 1068 

(233) iriFORMATION FOR SEQ ID NO:232: 

(i) SEQUENCE CHAJ^ACTERISTICS : 

(A) LENGTH: 3 55 amino acids 

(B) TYPE: amino acid 
(C} STRAm)EDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232: 

Met Asp Gin Phe Pro Glu Ser Val Thr Glu Asn Phe Glu Tyr Asp Asp 
15 10 15 

Leu Ala Glu Ala Cys Tyr He Gly Asp He Val Val Phe Gly Thr Val 
20 25 30 

Phe Leu Ser He Phe Tyr Ser Val He Phe Ala He Gly Leu Val Gly 
35 40 45 

Asn Leu Leu Val Val Phe Ala Leu Thr J'lsn Ser Lys Lyc Pro Lys Ser 
50 55 60 

Val Thr Asp He Tyr Leu Leu Asn Leu Ala Leu Ser Asp Leu Leu Phe 
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65 70 75 80 

Val Ala Thr Leu Pro Phe Trp Thr His Tyr Leu lie Asn Glu Lys Gly 
85 90 05 

Leu His Asn Ala Met Cys Lys Phe Thr Thr Ala Phe Phe Phe Tie Gly 
100 105 110 

Phe Phe Gly Ser He Phe Phe Tie Thr Val lie Ser lie Asp Arg Tyr 
115 120 125 

Leu Ala He Val Leu Ala Ala Asn Ser Met Asn Asn Arg Thr Val Gin 
130 135 140 

His Gly Val Thr He Ser Leu Gly Val Trp Ala Ala Ala He Leu Val 
145 150 155 160 

Ala Ala Pro Gin Phe Met Phe Thr Lys Gin Lys Glu Asn Glu Cys Leu 
165 170 175 

Gly Asp Tyr Pro Glu Val Leu Gin Glu He Trp Pro Val Leu Arg Asn 
180 185 190 

Val Glu Thr Asn Phe Leu Gly Phe Leu Leu Pro Leu Leu He Met Ser 
195 200 205 

Tyr Cys Tyr Phe Arg He He Gin Thr Leu Phe Ser Cys Lys Asn His 
210 215 220 

Lys Lys Ala Lys Ala Lys Lys Leu He Leu Leu Val Val He Val Phe 
225 230 235 240 

Phe Leu Phe Trp Thr Pro Tyr Asn Val Met He Phe Leu Glu Thr Leu 
245 25[] 255 

Lys Leu Tyr Asp Phe Phe Pro Ser Cys Asp Met Arg Lys Asp Leu Arg 
250 265 270 

Leu Ala Leu Ser Val Thr Glu Thr Val Ala Phe Ser Hia Cys Cys Leu 
275 280 285 

Asn Pro Leu He Tyr Ala Phe Ala Gly Glu Lys Phe Arg Arg Tyr Leu 
290 295 300 

Tyr His Leu Tyr Gly Lys Cys Leu Ala Val Leu Cys Gly Arg Ser Val 
305 310 315 320 

His Val Asp Phe Ser Ser Ser Glu Ser Gin Arg Ser Arg His Gly Ser 
325 330 335 

Val Leu Ser Ser Asn Phe Thr Tyr His Thr Ser Asp Gly Asp Ala Leu 
340 345 350 



Leu Leu Leu 

355 
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(234) INFORMATION FOR SEO ID NO: 23 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS r single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233; 
10 GGCTTAAGAG CATCATCGTG GTGCTGGTG 

(235) INFORMATION FOR SEQ ID NO:234: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI -SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:234: 
20 GTCACCACCA GCACCACGAT GATGCTCTTA AGCC 
(2 36) INFORMATION FOR SEQ ID NO: 23 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 23 5: 

CAAAGAAAGT ACTGGGCATC GTCTTCTTCC T 
30 (237) INFORMATION FOR SEQ ID NO: 2 36; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCKTPTION : SEQ ID NO: 236: 
TGCTCTAGAT TCCAGATAGG TGAAAACTTG 

(238) INFORMATION FOR SEQ ID NO. 237: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 37; 
CTAGGGGCAC CATGCAGGCT ATCAACAATG AAAGAAAAGC TAAGAAAGTC 

(239) INFORMATION FOR SEQ ID NO: 2 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI-SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 238: 
CAAGGACTTT CTTAGCTTTT CTTTCATTGT TGATAGCCTG CATGGTGCCC 

(240) INFORMATION FOR SEQ ID NO: 239: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 9: 
CGGCGGCAGA AGGCGAAACG CATGATCCTC GCGGT 

(241) INFORMATION FOR SEQ ID NO: 24 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240: 
ACCGCGAGGA TCATGCGTTT CGCCTTCTGC CGCCG 

(242) INFORMATION FOR SEQ ID NO: 241: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:241: 
GAGA CAT ATT ATCTGCCACG GAGG 

(243) INFORMATION FOR SEQ ID NO: 2 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 4 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 42: 
TTGGCATAGA AACCGGACCC AAGG 

(244) INFORMATION FOR SEQ ID NO: 243: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:243: 
TAAGAATTCC ATAAAAATTA TGGAATGG 
(245) INFORMATION FOR SEQ ID NO:244: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244; 
CCAGGATCCA GCTGAAGTCT TCCATCATTC 3 0 

(246) INFORMATION FOR SEQ ID NO: 24 5: 

( i ) SEQUENCE CriARACTERISTICS : 

(A) LENGTH: 1071 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic; 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 5: 

ATGAATGGGG TCTCGGAGGG GACCAGAGGC TGCAGTGACA GGCAACCTGG GGTCCTGACA 6 0 

CGTGATCGCT CTTGTTCCAG GAAGATGAAC TCTTCCGGAT GCCTGTCTGA GGAGGTGGGG 12 0 

TCCCTCCGCC CACTGACTGT GGTTATCCTG TCTGCGTCCA TTGTCGTCGG AGTGCTGGGC 180 

AATGGGCTGG TGCTGTGGAT GACTGTCTTC CGTATGGCAC GCACGGTCTC CACCGTCTGC 24 0 

TTCTTCCACC TGGCCCTTGC CGATTTCATG CTCTCACTGT CTCTGCCCAT TGCCATGTAC 3 00 

TATATTGTCT CCAGGCAGTG GCTCCTCGGA GAGTGGGCCT GCAAACTCTA CATCACCTTT 36 0 

GTGTTCCTCA GCTACTTTGC CAGTAACTGC CTCCTTGTCT TCATCTCTGT GGACCGTTGC 42 0 

ATCTCTGTCC TCTACCCCGT CTGGGCCCTG AACCACCGCA CTGTGCAGCG GGCGAGCTGG 48 0 

CTGGCCTTTG GGGTGTGGCT CCTGGCCGCC GCCTTGTGCT CTGCGCACCT GAAATTCCGG S4 0 

ACAACCAGAA AATGGAATGG CTGTACGCAC TGCTACTTGG CGTTCAACTC TGACAATGAG 60 0 

ACTGCCCAGA TTTGGATTGA AGGGGTCGTG GAGGGACACA TTATAGGGAC CATTGGCCAC 66 0 

TTCCTGCTGG GCTTCCTGGG GCCCTTAGCA ATCATAGGCA CCTGCGCCCA CCTCATCCGG 72 0 

GCCAAGCTCT TGCGGGAGGG CTGGGTCCAT GCCAACCGGC CCGCGAGGCT GCTGCTGGTG 78 0 

CTGGTGAGCG CTTTCTTTAT CTTCTGGTCC CCGTTTAACG TGGTGCTGTT GGTCCATCTG 84 0 

TGGCGACGGG TGATGCTCAA GGAAATCTAC CACCCCCGGA TGCTGCTCAT CCTCCAGGCT 900 

AGCTTTGCCT TGGGCTGTGT CAACAGCAGC CTCAACCCCT TCCTCTACGT CTTCGTTGGC 96 0 



wo 00/22129 



PCT/US99/23938 



205 

AGAGATTTCC AAGAAAAGTT TTTCCAGTCT TTGACTTCTG CCCTGGCGAG GGCGTTTGGA 102 0 

GAGGAGGAGT TTCTGTCATC CTGTCCCCGT GGCAACGCCC CCCGGGAATG A 10 71 

<247) INFORMATION FOR SEQ ID NO: 246: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 356 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

{ii} MOLECULE TYPE: protein 

0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 246: 

Met Asn Gly Val Ser Glu Gly Thr Arg Gly Cys Ser Asp Arg Gin Pro 
15 10 15 

Gly Val Leu Thr Arg Asp Arg Ser Cys Ser Arg Lys Met Asn Ser Ser 
20 25 30 

5 Gly Cys Leu Ser Glu Glu Val Gly Ser Leu Arg Pro Leu Thr Val Val 

35 40 45 

lie Leu Ser Ala Ser He Val Val Gly Val Leu Qly Asn Gly Leu Val 
50 55 6C 

Leu Trp Met Thr Val Phe Arg Met Ala Arg Thr Val Ser Thr Val Cys 
0 65 70 75 80 

Phe Phe His Leu Ala Leu Ala Asp Phe Met Leu Scr Leu Ser Leu Pro 

85 90 95 

He Ala Met Tyr Tyr He Val Ser Arg Gin Trp Leu Leu Gly Glu Trp 
100 105 110 



Ala Cys Lys Leu Tyr He Thr Phe Val Phe Leu Ser Tyr Phe Ala Ser 
115 120 125 

Asn Cys Leu Leu Val Phe He Ser Val Asp Arg Cys He Ser Val Leu 
130 135 140 

Tyr Pro Val Trp Ala Leu Asn His Arg Thr Val Gin Arg Ala Ser Trp 
145 150 155 160 

Leu Ala Phe Gly Val Trp Leu Leu Ala Ala Ala Leu Cys Ser Ala His 
165 170 175 

Leu Lys Phe Arg Thr Thr Arg Lys Trp Asn Gly Cys Thr His Cys Tyr 
180 185 190 

Leu Ala Phe Asn Ser Asp Asn Glu Thr Ala Gin He Trp He Glu Gly 
195 200 205 
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Val Val Glu Gly His He He Gly Thr He Gly His Phe Leu Leu Gly 
210 21S 220 

Phe Leu Gly Pro Leu Ala He He Gly Thr Cys Ala Hin Leu He Arg 
225 230 235 240 

Ala Lys Leu Leu Arg Glu Gly Trp Val His Ala Asn Arg Pro Ala Arg 
245 250 255 

Leu Leu Leu Val Leu Val Ser Ala Phe Phe He Phe Trp Ser Pro Phe 
260 265 270 

Asn Val Val Leu Leu Val His Leu Trp Arg Arg Val Met Leu Lys Glu 
275 280 285 

He Tyr His Pro Arg Met Leu Leu He Leu Gin Ala Ser Phe Ala Leu 
290 295 300 

Gly Cys Val Asn Ser Ser Leu Asn Pro Phe Leu Tyr Val Phe Val Gly 
305 310 315 320 

Arg Asp Phe Gin Glu Lys Phe Phe Gin Ser Leu Thr Ser Ala Leu Ala 
325 330 335 

Arg Ala Phe Gly Glu Glu Glu Phe Leu Ser Ser Cys Pro Arg Gly Asn 
340 345 350 

Ala Pro Arg Glu 
355 

(248) INFORMATION FOR SEQ ID NO: 247: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: UNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 7: 
GCAGAATTCG GCGGCCCCAT GGACCTGCCC CC 

(249) INFORMATION FOR SEQ ID NO:24B: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 248: 
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GCTGGATCCC CCGAGCAGTG GCGTTACTTC 



30 



(2501 INFORMATION FOR SEQ ID NO: 24 9: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 903 base pairs 

(B) TYPE: nucleic acid 

(C) STRAJTOEDNESS : single 

(D) TOPOLOGY: linear 



(ii) 



MOLECULE TYPE: DNA (genomic) 



(xi) 



SEQUENCE DESCRIPTION; 



SEQ TD NO: 24 9 : 



0 



ATGGACCTGC 



CCCCGCAGCT 



CTCCTTCGGC 



CTCTATGTGG 



CCGCCTTTGC GCTGGGCTTC 



60 



CCGCTCAACG TCCTGGCCAT CCGAGGCGCG ACGGCCCACG CCCGGCTCCG TCTCACCCCT 120 

AGCCTGGTCT ACGCCCTGAA CCTGGGCTGC TCCGACCTGC TGCTGACAGT CTCTCTGCCC 180 

CTGAAGGCGG TGGAGGCGCT AGCCTCCGGG GCCTGGCCTC TGCCGGCCTC GCTGTGCCCC 24 0 

GTCTTCGCGG TGGCCCACTT CTTCCCACTC TATGCCGGCG GGGGCTTCCT GGCCGCCCTG 3 00 

5 AGTGCAGGCC GCTACCTGGG AGCAGCCTTC CCCTTGGGCT ACCAAGCCTT CCGGAGGCCG 3G0 

TGCTATTCCT GGGGGGTGTG CGCGGCCATC TGGGCCCTCG TCCTGTGTCA CCTGGGTCTG 42 0 

GTCTTTGGGT TGGAGGCTCC AGGAGGCTGG CTGGACCACA GCAACACCTC CCTGGGCATC 48 0 

AACACACCGG TCAACGGCTC TCCGGTCTGC CTGGAGGCCT GGGACCCGGC CTCTGCCGGC 54 0 

CCGGCCCGCT TCAGCCTCTC TCTCCTGCTC TTTTTTCTGC CCTTGGCCAT CACAGCCTTC 6 0C 

0 TGCTACGTGG GCTGCCTCCG GGCACTGGCC CGCTCCGGCC TGACGCACAG GCGGAAGCTG 6tG 

CGGGCCGCCT GGGTGGCCGG CGGGGCCCTC CTCACGCTGC TGCTCTGCGT AGGACCCTAC 72 0 

AACGCCTCCA ACGTGGCCAG CTTCCTGTAC CCCAATCTAG GAGGCTCCTG GCGGAAGCTG 780 

GGGCTCATCA CGGGTGCCTG GAGTGTGGTG CTTAATCCGC TGGTGACCGG TTACTTGGGA 84 0 

AGGGGTCCTG GCCTGAAGAC AGTGTGTGCG GCAAGAACGC AAGGGGGCAA GTCCCAGAAG 9 00 

5 TAA 903 
{251) INFORMATION FOR SEQ ID NO: 250: 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 300 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 
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(ii) MOLPXULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 50: 

Met Asp Leu Pro Pro Gin Leu Ser Phe Gly Leu Tyr Val Ala Ala Phe 
IS 10 15 

Ala Leu Gly Phe Pro Leu Asn Val Leu Ala He Arg Gly Ala Thr Ala 
20 25 30 

His Ala Arg Leu Arg Leu Thr Pro Ser Leu Val Tyr Ala Leu Asn Leu 
35 40 45 

Gly Cys Ser Asp Leu Leu Leu Thr Val Ser Leu Pro Leu Lys Ala Val 
50 55 60 

Glu Ala Leu Ala Ser Gly Ala Trp Pro Leu Pro Ala Ser Leu Cys Pro 
65 70 75 80 

Val Phe Ala Val Ala His Phe Phe Pro Leu Tyr Ala Gly Gly Gly Phe 
85 90 95 

Leu Ala Ala Leu Ser Ala Gly Arg Tyr Leu Gly Ala Ala Phe Pro Leu 
100 105 110 

Gly Tyr Gin Ala Phe Arg Arg Pro Cys Tyr Scr Trp Gly Val Cys Ala 
115 120 125 

Ala Tie Trp Ala Leu Val Leu Cys His Leu Gly Leu Val Phe Gly Leu 
130 135 140 

Glu Ala Pro Gly Gly Trp Leu Asp His Ser Asn Thr Ser Leu Gly lie 
14 5 150 15 5 160 

Asn Thr Pro Val Asn Gly Ser Pro Val Cys Leu Glu Ala Trp Asp Pro 
165 17C 175 

Ala Ser Ala Gly Pro Ala Arg Phe Ser Leu Ser Leu Leu Leu Phe Phe 
180 185 190 

Leu Pro Leu Ala lie Thr Ala Phe Cys Tyr Val Gly Cys Leu Arg Ala 
195 200 205 

Leu Ala Arg Ser Gly Leu Thr His Arg Arg Lys Leu Arg Ala Ala Trp 
210 215 220 

Val Ala Gly Gly Ala Leu Leu Thr Leu Leu Leu Cys Val Gly Pro Tyr 
225 230 235 240 

Asn Ala Ser Asn Val Ala Ser Phe Leu Tyr Pro Asn Leu Gly Gly Ser 
245 250 255 

Trp Arg Lys Leu Gly Leu He Thr Gly Ala Trp Ser Val Val Leu Asn 
260 265 270 



Pro Leu Val Thr Gly Tyr Leu Gly Arg Gly Pro Gly Leu Lys Thr Val 
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275 280 285 

Cys Ala Ala Arg Thr Gin Gly Gly Lys Ser Gin Lys 
290 295 300 

(252) INFORMATION FOR SEQ ID NO: 251: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

0 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251: 
CTCAAGCTTA CTCTCTCTCA CCAGTGGCCA C 31^ 

(253) INFORMATION FOR SEQ ID NO: 252: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

0 Cxi) SEQUENCE DESCRIPTION: SEQ ID KO;252: 

CCCTCCTCCC CCGGAGGACC TAGC 24 

(254) INFORMATION FOR SEQ ID NO:253: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1041 base pairs 

5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 53: 

') ATGGATACAG GCCCCGACCA GTCCTACTTC TCCGGCAATC ACTGGTTCGT CTTCTCGGTG 6 0 

TACCTTCTCA CTTTCCTGGT GGGGCTCCCC CTCAACCTGC TGGCCCTGGT GGTCTTCGTG 120 

GGCAAGCTGC AGCGCCGCCC GGTGGCCGTG GACGTGCTCC TGCTCAACCT GACCGCCTCG 180 

GACCTGCTCC TGCTGCTGTT CCTGCCTTTC CGCATGGTGG AGGCAGCCiU^ TGGCATGCAC 24 0 

TGGCCCCTGC CCTTCATCCT CTGCCCACTC TCTGGATTCA TCTTCTTCAC CACCATCTAT 3 00 
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CTCACCGCCC TCTTCCTGGC AGCTGTGAGC ATTGAACGCT TCCTGAGTGT GGCCCACCCA 360 

CTGTGGTACA AGACCCGGCC GAGGCTGGGG CAGGGAGGTC TGGTGAGTGT GGCCTGCTGG 420 

CTGTTGGCCT CTGCTCACTG CAGCGTGGTC TACGTCATAG A.ATTCTCAGG GGACATCTCC 480 

CACAGCCAGG GCACGAATGG GACCTGCTAC CTGGAGTTCC GGAAGGACCA GCTAGCCATC 54 0 

5 CTCCTGCCGG TGCGGCTGGA GATGGCTGTG GTCCTCTTTG TGGTCCCGCT GATCATCACC 6 00 

AGCTACTGCT ACAGCGGCCT GGTGTGGATC CTCGGCAGAG GGGGCAGCCA CCGCCGGCAG 66 0 

AGGAGGGTGG CGGGGCTGTT GGCGGCCACG CTGCTCAACT TCCTTGTCTG CTTTGGGCGC 72 0 

TACAA CGTGT CCCATGTCGT GGGCTATATC TGCGGTGAAA GGCCGGCATG GAGGATCTAG 780 

GTGACGCTTC TCAGCACCCT GAACTCCTGT GTCGACCCCT TTGTCTACTA CTTGTCCTCC 84 0 

0 TCCGGGTTCC AAGCCGACTT TCATGAGCTG CTGAGGAGGT TGTGTGGGCT CTGGGGCCAG 90 0 

TGGCAGCAGG AGAGCAGCAT GGAGCTGAAG GAGCAGAAGG GAGGGGAGGA GCAGAGAGCG 96 0 

GACCGACCAG CTGAAAGAAA GACCAGTGAA CACTCACAGG GGTGTGGAAC TGGTGGCCAG 102 0 

GTGGCCTGTG CTGAAAGCTA G 1041 
(2 51)} INFORMATION FOR SEQ ID NO: 2 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 346 amino acids 

(B) TYPE: amino acid 

(C) STRANBEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQLTENCE DESCRIPTION: SEQ ID NO:254: 

Met Asp Thr Gly Pro Asp Gin Ser Tyr Phe Ser Gly Asn His Trp Phe 

15 10 15 

Val Phe Ser Val Tyr Leu Leu Thr Phe Leu Val Gly Leu Pro Leu Asn 
20 25 30 

Leu Leu Ala Leu Val Val Phe Val Gly Lys Leu Gin Arg Arg Pro Val 
35 40 45 

Ala Val Asp Val Leu Leu Leu Asn Leu Thr Ala Ser Asp Leu Leu Leu 
50 55 60 

Leu Leu Phe Leu Pro Phe Arg Met Val Glu Ala Ala Asn Gly Met His 
65 70 75 aO 

Trp Pro Leu Pro Phe lie Leu Cyc Pro Leu Ser Gly Phe lie Phe Phe 
85 90 95 
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Thr Thr Tie Tyr Leu Thr Ala Leu Phe Leu Ala Ala Val Ser lie Glu 
100 105 110 

Arg Phe Leu Ser Val Ala His Pro Leu Trp Tyr Lye Thr Arg Pro Arg 
115 12C 125 

Leu Gly Gin Ala Gly Leu Val Ser Val Ala Cys Trp Leu Leu Ala Ser 
130 135 140 

Ala His CyG Ser Val Val Tyr Val lie Glu Phe Ser Gly Asp He Ser 
145 150 155 ^ 160 

His ser Gin Gly Thr Asn Gly Thr Cys Tyr Leu Glu Phe Arg Lys Asp 
165 170 175 

Gin Leu Ala He Leu Leu Pro Val Arg Leu Glu Met Ala Val Val Leu 

180 185 190 

Phe Val Val Pro Leu He He Thr Ser Tyr Cys Tyr Ser Arg Leu Val 
195 200 205 

Trp He Leu Gly Arg Gly Gly Ser His Arg Arg Gin Arg Arg Val Ala 
210 215 220 

Gly Leu Leu Ala Ala Thr Leu Leu Ar,n Phe Leu Val Cys Phe Gly Pro 
225 230 235 240 

Tyr Asn Val Ser His Val Val Gly Tyr He Cys Gly Glu Ser Pro Ala 

245 250 255 

Trp Arg He Tyr Val Thr Leu Leu Ser Thr Leu Asn Ser Cys Val Asp 
260 265 270 

Pro Phe Val Tyr Tyr Phe Ser Ser Ser Gly Phe Gin Ala Asp Phe His 
275 280 285 

Glu Leu Leu Arg Arg Leu Cys Gly Leu Trp Gly Gin Trp Gin Gin Glu 
290 295 300 

Ser Ser Met Glu Leu Lys Glu Gin Lys Gly Gly Glu Glu Gin Arg Ala 
310 315 320 

Asp Arg Pro Ala Glu Arg Lys Thr Ser Glu His Ser Gin Gly Cys Gly 
325 330 335 

Thr Gly Gly Gin Val Ala Cys Ala Glu Ser 
340 345 

(2 56) INFORMATION FOR SEQ ID NO: 2 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 255: 
TTTAAGCTTC CCCTCCAGGA TGCTGCCGGA C 31 

(257) INFORMATION FOR SEQ ID NO: 2 56: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: not relevant 

0 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 56: 
GGCGAATTCT GAAGGTCCAG GGAAACTGCT A 31 

(258) INFORMATION FOR SEQ ID NO: 2 57: 

(l) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH; 993 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 257: 

ATGCTGCCGG ACTGGT^GAG CTCCTTGATC CTCATGGCTT ACATCATCAT CTTCCTCACT 6 0 

GGCCTCCCTG CCAACCTCCT GGCCCTGCGG GCCTTTGTGG G^CGGATCCG CCAGCCCCAG 120 

CCTGCACCTG TGCACATCCT CCTGCTGAGC CTGACGCTGG CCGACCTCCT CCTGCTGCTG 180 

CTGCTGCCCT TCAAGATCAT CGAC-GCTGCG TCGAACTTCC GCTGGTACCT GCCCAAGGTC 24 0 

GTCTGCGCCC TCACGAGTTT TGGCTTCTAC AGCAGCATCT ACTGCAGCAC GTGGCTCCTG 3 00 

GCGGGCATCA GCATCGAGCG CTACCTGGGA GTGGCTTTCC CCGTGCAGTA CAAGCTCTCC 360 

CGCCGGCCTC TGTATGGAGT GATTGCAGCT CTGGTGGCCT GCGTTATGTC CTTTGGTCAC 42 0 

TGCACCATCG TGATCATCGT TCAATACTTG AACACGACTG AGCAGGTCAG AAGTGGCAAT 480 

GAAATTACCT GCTACGAGAA CTTCACCGAT AACCAGTTGG ACGTGGTGCT GCCCGTGCGG 54 0 

CTGGAGCTGT GCCTGGTGCT CTTCTTCATC CCCATGGCAG TCACCATCTT CTGCTACTGG 6 00 

CGTTTTGTGT GGATCATGCT CTCCCAGCCC CTTGTGGGGG CCCAGAGGCG GCGCCGAGCC 660 

GTGGGGCTGG CTGTGGTGAC GCTGCTCAAT TTCCTGGTGT GCTTCGGACC TTACAACGTG 720 
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TCCCACCTGG TGGGGTATCA CCAGAGAAAA AGCCCCTGGT GGCGGTCAAT AGCCGTGGTG 780 

TTCAGTTCAC TCAACGCCAG TCTGGACCCC CTGCTCTTCT ATTTCTCTTC TTCAGTGGTG 84 0 

CGCAGGGCAT TTGGGAGAGG GCTGCAGGTG CTGCGGAATC AGGGCTCCTC CCTGTTGGGA 90 0 

CGCAGAGGCA AAGACACAGC AGAGGGGACA AATGAGGACA GGGGTGTGGG TCAAGGAGAA 96 0 

GGGATGCCAA GTTCGGACTT CACTACAGAG TAG 993 
(259) INFORMATION FOR SEQ ID rJO:258: 

(i) SEQUENCE CfiARACTERI STI CS : 

(A) LENGTH: 362 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 58: 

Met Leu Pro Asp Trp Lys Ser Ser Leu lie Leu Met Ala Tyr lie lie 
IS 10 15 

lie Phe Leu Thr Gly Leu Pro Ala Asn Leu Leu Ala Leu Arg Ala Phe 

20 25 30 

Val Gly Arg He Arg Gin Pro Gin Pro Ala Pro Val His He Leu Leu 
35 40 45 

Leu Ser Leu Thr Leu Ala Asp Leu Leu Leu Leu Leu Leu Leu Pro Phe 
50 55 60 

Lys He He Glu Ala Ala Ser Asn Phe Arg Trp Tyr Leu Pro Lys Val 
65 70 75 80 

Val Cys Ala Leu Thr Ser Phe Gly Phe Tyr Ser Ser He Tyr Cys Ser 
65 90 95 

Thr Trp Leu Leu Ala Gly He Ser He Glu Arg Tyr Leu Gly Val Ala 
100 105 110 

Phe Pro Val Gin Tyr Lys Leu Ser Arg Arg Pro Leu Tyr Gly Val He 
115 120 125 

Ala Ala Leu Val Ala Trp Val Met Ser Phe Gly His Cys Thr He Val 
130 135 140 

He He Val Gin Tyr Leu Asn Thr Thr Glu Gin Val Arg Ser Gly Asn 
150 155 160 

Glu He Thr Cys Tyr Glu Asn Phe Thr Asp Asn Gin Leu Asp Val Val 
165 170 175 
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Leu Pro Val Arg Leu Glu Leu Cys Leu Val Leu Phe Phe Tie Pro Met 
160 185 190 

Ala Val Thr lie Phe Cys Tyr Trp Arg Phe Val Trp He Met Leu Ser 
195 200 205 

Gin Pro Leu Val Gly Ala Gin Arg Arg Arg Arg Ala Val Gly Leu Ala 
210 215 220 

Val Val Thr Leu Leu Asn Phe Leu Val Cys Phe Gly Pro Tyr Asn Val 
225 230 235 240 

Ser His Leu Val Gly Tyr His Gin Arg Lys Ser Pro Trp Trp Arg Ser 
245 250 255 

He Ala Val Val Phe Ser Ser Leu Asn Ala Ser Leu Acp Pro Leu Leu 
260 265 270 

Phe Tyr Phe Ser Ser Ser Val Val Arg Arg Ala Phe Gly Arg Gly Leu 
275 280 285 

Gin Val Leu Arg Asn Gin Gly Ser Ser Leu Leu Gly Arg Arg Gly Lys 
290 295 300 

Asp Thr Ala Glu Gly Thr Asn Glu Asp Arg Gly Val Gly Gin Gly Glu 
305 310 315 320 

Gly Met Pro Ser Ser Asp Phe Thr Thr Glu 
325 330 

(26 0) INFORMATION FOR SEQ ID NO : 2 5 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 9: 
CCCAAGCTTC GGGCACCATG GACACCTCCC 
(261) INFORMATION FOR SEQ ID NO : 260: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:260: 
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ACAGGATCCA AATGCACAGC ACTGGTAAGC 3( 

(26 2) INFORr^lATION FOR SEQ ID NO: 261: 

(i) SEQUENCE CHARACTERISTICS: 
{A} LENGTH: 25 base pairs 
{B} TYPE: nucleic acid 
{C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ia) MOLECULE TYPE: DNA (genomic) 

{XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 261: 
CTATAACTGG GTTACATGGT TTAAC 2^ 
(26 3) INFORMATION FOR SEQ ID NO: 26 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS. single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 2: 
TTTGAATTCA CATATTAATT AGAGACATGG 3q 
(264) INFORMATION FOR SEQ ID NO:263: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2724 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 3: 
ATGGACACCT CCCGGCTCGG TGTGCTCCTG TCCTTGCCTG TGCTGCTGCA GCTGGCGACC 6 0 

GGGGGCAGCT CTCCCAGGTC TGGTGTGTTG CTGAGGGGCT GCCCCACACA CTGTCATTGC 12 0 

GAGCCCGACG GCAGGATGTT GCTCAGGGTG GACTGCTCCG ACCTGGGGCT CTCGGAGCTG 3 80 

CCTTCCAACC TCAGCGTCTT CACCTCCTAC CTAGACCTCA GTATGAACAA CATCAGTCAG 24 0 

CTGCTCCCGA ATCCCCTGCC CAGTCTCCGC TTCCTGGAGG AGTTACGTCT TGCGGGAAAC 3 00 

GCTCTGACAT ACATTCCCAA GGGAGCATTC ACTGGCCTTT ACAGTCTTAA AGTTCTTATG 360 
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CTGCAGAATA 


ATCAGCTAAG 


ACACGTACCC 


ACAGAAGCTC 


TGCAGAATTT 


GCGAAGCCTT 


420 




CAATCCCTGC 


GTCTGGATGC 


TAACCACATC 


AGCTATGTGC 


CCCCAAGCTG 


TTTCAGTGGC 


480 




CTGCATTCCC 


TGAGGCACCT 


GTGGCTGGAT 


GACAATGCGT 


T/iACAGAAAT 


CCCCGTCCAG 


540 




GCTTTTAGAA 


GTTTATCGGC 


ATTGCAAGCC 


ATGACGTTGG 


CCCTGAACAA 


AATACACCAC 


600 


5 


ATACCAGACT 


ATGCCTTTGG 


AAACCTCTCC 


AGCTTGGTAG 


TTCTACATCT 


CCATAACAAT 


660 




AGAATCCACT 


CCCTGGGAAA 


GAAATGCTTT 


GATGGGCTCC 


ACAGCCTAGA 


GACTTTAGAT 


720 




TTAAATTACA 


ATAACCTTGA 


TGAATTCCCC 


ACTGCAATTA 


GGACACTCTC 


CAACCTTAAA 


780 




GAACTAGGAT 


TTCATAGCAA 


CAATATCAGG 


TCGATACCTG 


AGAAAGCATT 


TGTAGGCAAC 


84 0 




CCTTCTCTTA 


TTACAATACA 


TTTCTATGAC 


AATCCCATCC 


AATTTGTTGG 


GAGATCTGGT 


900 


0 


TTTCAACATT 


TACCTGAACT 


AAGAACACTG 


ACTCTGAATG 


GTGCCTCACA 


AATAACTGAA 


960 




TTTCCTGATT 


TAACTGGAAC 


TGCAAACCTG 


GAGAGTCTGA 


CTTTAACTGG 


AGCACAGATC 


1020 




TCATCTCTTC 


CTCAAACCGT 


CTGCAATCAG 


TTACCTAATC 


TCCAAGTGCT 


AGATCTGTCT 


1080 




TACAACCTAT 


TAGAAGATTT 


ACCCAGTTTT 


TCAGTCTGCC 


AAAAGCTTCA 


GAAAATTGAC 


1140 




CTAAGACATA 


ATGAAATCTA 


CGAAATTAAA 


GTTGACACTT 


TCCAGCAGTT 


GCTTAGGCTC 


1200 


5 


CGATCGCTGA 


ATTTGGCTTG 


GAACAAAATT 


GCTATTATTC 


ACCCCAATGC 


ATTTTCCACT 


1260 




TTGCCATCCC 


TAATAAAGCT 


GGACCTATCG 


TCCAACCTCC 


TGTCGTCTTT 


TCCTATAACT 


1320 




GGGTTACATG 


GTTTAACTCA 


CTTTWIATTA 


ACAGGAAATC 


ATGCCTTACA 


GAGCTTGATA 


1380 




TCATCTGAAA 


ACTTTCCAGA 


ACTCAAGGTT 


ATAGAAATGC 


CTTATGCTTA 


CCAGTGCTGT 


1440 




GCATTTGGAG 


TGTGTGAGAA 


TGCCTATAAG 


ATTTCTAATC 


AATGGAATAA 


AGGTGACAAC 


1500 


0 


AGCAGTATGG 


ACGACCTTCA 


TAAGAAAGAT 


GCTGGAATGT 


TTCAGGCTCA 


AGATGAACGT 


1560 




GACCTTGAAG 


ATTTCCTGCT 


TGACTTTGAG 


GAAGACCTGA 


AAGCCCTTCA 


TTCAGTGCAG 


1620 




TGTTCACCTT 


CCCCAGGCCC 


CTTCAAACCC 


TG TG AAC A C C 


TG PT Tn A Taa 




168 0 




AGAATTGGAG 


TGTGGACCAT 


AGCAGTTCTG 


GCACTTACTT 


GTAATGCTTT 


GGTGACTTCA 


1740 




ACAGTTTTCA 


GATCCCCTCT 


GTACATTTCC 


CCCATTAAAC 


TGTTAATTGG 


GGTCATCGCA 


1800 


5 


GCAGTGAACA 


TGCTCACGGG 


AGTCTCCAGT 


GCCGTGCTGG 


CTGGTGTGGA 


TGCGTTCACT 


1850 




TTTGGCAGCT 


TTGCACGACA 


TGGTGCCTGG 


TGGGAGAATG 


GGGTTGGTTG 


CCATGTCATT 


1920 




GGTTTTTTGT 


CCATTTTTGC 


TTCAGAATCA 


TCTGTTTTCC 


TGCTTACTCT 


GGCAGCCCTG 


1980 




GAGCGTGGGT 


TCTCTGTGAA 


ATATTCTGCA 


AAATTTGAAA 


CGAAAGCTCC 


ATTTTCTAGC 


2040 
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CTGAAAGTAA TCATTTTGCT CTGTGCCCTG CTGGCCTTGA CCATGGCCGC AGTTCCCCTG 210 0 

CTGGGTGGCA GCAAGTATGG CGCCTCCCCT CTCTGCCTGC CTTTGCCTTT TGGGGAGCCX 216 0 

AGCACCATGG GCTACATGGT CGCTCTCATC TTGCTCAATT CCCTTTGCTT CCTCATGATG 2 22 0 

ACCATTGCCT ACACCAAGCT CTACTGCAAT TTGGACAAGG GAGACCTGGA GAATATTTGG 1^2 8 0 

GACTGCTCTA TGGTAAAACA CATTGCCCTG TTGCTCTTCA CCAACTGCAT CCTAAACTGC ^3 40 

CGTGTGGCTT TCTTGTCCTT CTCCTCTTTA ATAAACCTTA CATTTATCAG TCCTGAAGTA 2 4 00 

ATTAAGTTTA TCCTTCTGGT GGTAGTCCCA CTTCCTGCAT GTCTCAATCC CCTTCTCTAC 2460 

ATCTTGTTCA ATCCTCACTT TAAGGAGGAT CTGGTGAGCC TGAGAAAGCA AACCTACGTC 2 52 0 

TGGACAAGAT CAAAACACCC AAGCTTGATG TCAATTAACT CTGATGATGT CGAAAAACAG 2580 

TCCTGTGACT CAACTCAAGC CTTGGTAACC TTTACCAGCT CCAGCATCAC TTATGACCTG ?.6 4 0 

CCTCCCAGTT CCGTGCCATC ACCAGCTTAT CCAGTGACTG AGAGCTGCCA TCTTTCCTCT 2 700 

GTGGCATTTG TCCCATGTCT CTAA 2 72 4 
(2 65) INFORMATION FOR SEQ ID NO: 26 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 907 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 264: 

Met Asp Thr Ser Arg Leu Gly Val Leu Leu Ser Leu Pro Val Leu Leu 
IS 10 15 

Gin Leu Ala Thr Gly Gly Ser Ser Pro Arg Ser Gly Val Leu Leu Arg 
20 25 30 

Gly Cys Pro Thr His Cys His Cys GIu Pro Asp Gly Arg Met Leu Leu 
35 40 45 

Arg Val Asp Cys Ser Asp Leu Gly Leu Ser Glu Leu Pro Ser Asn Leu 
50 55 60 

Ser Val Phe Thr Ser Tyr Leu Asp Leu Ser Met Asn Asn lie Ser Gin 
65 70 75 80 

Leu Leu Pro Asn Pro Leu Pro Ser Leu Arg Phe Leu Glu Glu Leu Arg 
85 90 95 

Leu Ala Gly Asn Ala Leu Thr Tyr lie Pro Lys Gly Ala Phe Thr Gly 
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110 



Leu T>'r Ser Leu Lys Val Leu Met Leu Gin Asn Asn Gin I,eu Arg His 
115 120 125 

Val Pro Thr Glu Ala Leu Gin Asn Leu Arg Ser Leu Gin Ser Leu Arg 
130 135 140 

Leu Asp Ala Asn His lie Ser Tyr Val Pro Pro Ser Cys Phe Ser Gly 

145 150 155 

Leu His Ser Leu Arg His Leu Trp Leu Asp Asp Asn Ala Leu Thr Glu 
165 170 175 

lie Pro Val Gin Ala Phe Arg Ser Leu Ser Ala Lexi Gin Ala Met Thr 
180 185 190 

Leu Ala Leu Asn Lys lie His His lie Pro Asp Tyr Ala Phe Gly Asn 
195 200 205 

Leu Ser Ser Leu Val Val Leu His Leu His Asn Asn Arg lie His Ser 
210 215 220 

Leu Gly Lys Lys Cys Phe Asp Gly Leu His Ser Leu Glu Thr Leu Asp 
225 230 235 240 

Leu Asn Tyr Asn Asn Leu Asp Glu Phe Pro Thr Ala lie Arg Thr Leu 
245 250 255 

Ser Asn Leu Lys Glu Leu Gly Phe His Ser Asn Asn lie Arg Ser lie 
260 265 270 

Pro Glu Lys Ala Phe Val Gly Asn Pro Ser Leu lie Thr lie His Phe 
275 230 285 

Tyr Asp Asn Pro He Gin Phe Val Gly Arg Ser Ala Phe Gin His Leu 
290 295 300 

Pro Glu Leu Arg Thr Leu Thr Leu Asn Gly Ala Ser Gin He Thr Glu 
305 310 315 320 

Phe Pro Asp Leu Thr Gly Thr Ala Asn Leu Glu Ser Leu Thr Leu Thr 
325 330 335 

Gly Ala Gin He Ser Ser Leu Pro Gin Thr Val Cys Asn Gin Leu Pro 
340 345 350 

Asn Leu Gin Val Leu Asp Leu Ser Tyr Asn Leu Leu Glu Asp Leu Pro 
355 360 365 

Ser Phe Ser Val Cys Gin Lys Leu Gin Lys He Asp Leu Arg His Asn 
370 375 380 

Glu He Tyr Glu He Lys Val Asp Thr Phe Gin Gin Leu Leu Ser Leu 
385 390 395 400 
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Arg Ser Leu Asn Leu Ala Trp Asn Lys lie Ala I]e He His Pro Asn 
405 410 415 

Ala Phe Scr Thr Leu Pro Ser Leu He Lys Leu Anp Leu Ser Ser Asn 
420 425 430 

Leu Leu Ser Ser Phe Pro Tie Thr Gly Leu His Gly Leu Thr His Leu 
435 440 445 

Lys Leu Thr Gly Asn His Ala Leu Gin Ser Leu He Ser Ser Glu Asn 
450 455 460 

Phe Pro Glu Leu Lys Val He Glu Met Pro Tyr Ala Tyr Gin Cys Cys 
465 470 475 480 

Ala Phe Gly Val Cys Glu Asn Ala Tyr Lys He Ser Asn Gin Trp Asn 
485 490 495 

Lys Gly AGp Asn Ser Ser Met Asp Asp Leu His Lys Lys Asp Ala Gly 
500 505 510 

Met Phe Gin Ala Gin Asp Glu Arg Asp Leu Glu Asp Phe Leu Leu Asp 
515 520 525 

Phe Glu Glu Asp Leu Lys Ala Leu His Ser Val Gin Cys Ser Pro Ser 
530 535 540 

Pro Gly Pro Phe Lys Pro Cys Glu His Leu Leu Asp Gly Trp Leu He 
545 550 555 560 

Arg He Gly Val Trp Thr He Ala Val Leu Ala Leu Thr Cys Asn Ala 
565 570 575 

Leu Val Thr Ser Thr Val Phe Arg Ser Pro Leu Tyr He Ser Pro He 
580 585 590 

Lys Leu Leu He Gly Val He Ala Ala Val Asn Met Leu Thr Gly Val 

595 500 605 

Ser Ser Ala Val Leu Ala Gly Val Asp Ala Phe Thr Phe Gly Ser Phe 
610 615 620 

Ala Arg His Gly Ala Trp Trp Glu Asn Gly Val Gly Cys His Val He 
625 630 635 640 

Gly Phe Leu Ser He Phe Ala Ser Glu Ser Ser Val Phe Leu Leu Thr 
645 650 655 

Leu Ala Ala Leu Glu Arg Gly Phe Ser Val Lys Tyr Ser Ala Lys Phe 
660 665 670 

Glu Thr Lys Ala Pro Phe Ser Ser Leu Lys Val He He Leu Lea Cys 
675 680 685 



Ala Leu Leu Ala Leu Thr Met Ala Ala Val Pro Leu Leu Gly Gly Ser 
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690 



695 



700 



0 



10 



Lys Tyr Gly Ala Ser Pro Leu Cys Leu Pro Leu Pro Phe Gly Glu Pro 
705 710 715 720 

Ser Thr Met Gly Tyr Met Val Ala Leu He Leu Leu Asn Ser Leu Cys 
725 730 735 

Phe Leu Met Met Thr Tie Ala Tyr Thr Lys Leu TyY Cys Asn Leu Asp 
740 745 750 

Lys Gly Asp Leu Glu Asn He Trp Asp Cys Ser Met Val Lys His He 
755 760 765 

Ala Leu Leu Leu Phe Thr Asn Cys He Leu Asn Cys Pro Val Ala Phe 
770 775 780 

Leu Ser Phe Ser Ser Leu He Asn Leu Thr Phe He Ser Pro Glu Val 
785 790 795 800 

He Lys Phe He Leu Leu Val Val Val Pro Leu Pro Ala Cys Leu Asn 
805 810 815 

Pro Leu Leu Tyr He Leu Phe Asn Pro His Phe Lys Glu Asp Leu Val 
820 825 830 

Ser Leu Arg Lys Gin Thr Tyr Val Trp Thr Arg Ser Lys His Pro Ser 
835 840 845 

Leu Met Ser He Asn Ser Asp Asp Val Glu Lys Gin Ser Cys Asp Ser 
850 855 860 

Thr Gin Ala Leu Val Thr Phe Thr Ser Ser Ser He Thr Tyr Asp Leu 
865 870 875 eSO 

Pro Pro Ser Ser Val Pro Ser Pro Ala Tyr Pro Val Thr Glu Ser Cys 
885 890 895 

His Leu Ser Ser Val Ala Phe Val Pro Cys Leu 
900 905 



(266) INFORMATION FOR SEQ ID NO: 2 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic} 

(xi) SEQUENCE DESCRIPTION; SEQ ID KG: 26 5: 
CGGAAGCTGC GGGCCAAATG GGTGGCCGGC 
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(267) INFOR^aATION FOR SEQ ID NO; 266: 

(x) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 266 

CAGAGGAGGG TGAAGGGGCT GTTGGCG 

{26B) INFORMATION FOR SEQ ID NO: 26 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 7 
GGCGGCGCCG AGCCAAGGGG CTGGCTGTGG 

(269) INFORMATION FOR SEQ ID NO: 26 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 8: 
GGGACTGCTC TATGAAAAAA CACATTGCCC TG 

(270) INFORMATION FOR SEQ ID NO: 269: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1071 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 9: 
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27 



30 



ATGAATGGGG TCTCGGAGGG GACCAGAGGC TGCAGTGACA GGCAACCTGG GGTCCTGACA 
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CGTGATCGCT CTTGTTCCAG GAAGATGAAC TCTTCCGGAT GCCTGTCTGA GGAGGTGGGG 12 0 

TCCCTCCGCC CACTGACTGT GGTTATCCTG TCTGCGTCCA TTGTCGTCGG AGTGCTGGGC 180 

AATGGGCTGG TGCTGTGGAT GACTGTCTTC CGTATGGCAC GCACGGTCTC CACCGTCTGC 24 0 

TTCTTCCACC TGGCCCTTGC CGATTTCATG CTCTCACTGT CTCTGCCCAT TGCCATGTAC 3 00 

5 TATATTGTCT CCAGGCAGTG GCTCCTCGGA GAGTGGGCCT GCAAACTCTA CATCACCTTT 360 

GTGTTCCTCA GCTACTTTGC CAGTAACTGC CTCCTTGTCT TCATCTCTGT GGACCGTTGC 42 0 

ATCTCTGTCC TCTACCCCGT CTGGGCCCTG AACCACCGCA CTGTGCAGCG GGCGAGCTGG 48 0 

CTGGCCTTTG GGGTGTGGCT CCTGGCCGCC GCCTTGTGCT CTGCGCACCT GAAATTCCGG 54 0 

AC7VACCAGAA AATGGAATGG CTGTACGCAC TGCTAC7TGG CGTTCAACTC TGACAATGAG 60 0 

0 ACTGCCCAGA TTTGGATTGA AGGGGTCGTG GAGGGACACA TTATAGGGAC CATTGGCCAC 66 0 

TTCCTGCTGG GCTTCCTGGG GCCCTTAGCA ATCATAGGCA CCTGCGCCCA CCTCATCCGG 72 0 

GCCAAGCTCT TGCGGGAGGG CTGGGTCCAT GCCAACCGGC CCAAGAGGCT GCTGCTGGTG 79 0 

CTGGTGAGCG CTTTCTTTAT CTTCTGGTCC CCGTTTAACG TGGTGCTGTT GGTCCATCTG 84 0 

TGGCGACGGG TGATGCTCAA GGAAATCTAC CACCCCCGGA TGCTGCTCAT CCTCCAGGCT 900 

AGCTTTGCCT TGGGCTGTGT CAACAGCAGC CTCAACCCCT TCCTCTACGT CTTCGTTGGC 96 0 

AGAGATTTCC AAGAAAAGTT TTTCCAGTCT TTGACTTCTG CCCTGGCGAG GGCGTTTGGA 102 0 

GAGGAGGAGT TTCTGTCATC CTGTCCCCGT GGCAACGCCC CCCGGGAATG A 1071 
(271) INFORMATION FOR SEQ ID NO: 270: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEO ID NO: 270: 

Met Asn Gly Val Ser Glu Gly Thr Arg Gly Cys Ser Asp Arg Gin Pro 
15 10 15 

Gly Val Leu Thr Arg Asp Arg Ser Cys Ser Arg Lys Met Asn Ser Ser 

20 25 30 

Gly Cys Leu Ser Glu Glu Val Gly Ser Leu Arg Pro Leu Thr Val Val 
35 40 45 
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lie Leu Ser Ala Ser lie Val Val Gly Val Leu Gly Asn Gly Leu Val 
50 55 60 

Leu Trp Met Thr Val Phe Arg Met Ala Arg Thr Val Ser Thr Val Cys 
65 70 75 80 

5 Phe Phe His Leu Ala Leu Ala Asp Phe Met Leu Ser Leu Ser Leu Pro 

85 90 95 

He Ala Met Tyr Tyr He Val Ser Arg Gin Trp Leu Leu Gly Glu Trp 
100 105 110 

Ala Cys Lys Leu Tyr He Thr Phe Val Phe Leu Ser Tyr Phe Ala Ser 
0 115 120 125 

Asn Cys Leu Leu Val Phe lie Ser Val Asp Arg Cys He Ser Val Leu 
130 135 140 

Tyr Pro Val Trp Ala Leu Asn His Arg Thr Val Gin Arg Ala Ser Trp 
145 150 155 160 

5 Leu Ala Phe Gly Val Trp Leu Leu Ala Ala Ala Leu Cys Ser Ala His 

165 170 175 

Leu Lys Phe Arg Thr Thr Arg hys Trp Asn Gly Cys Thr His Cys Tyr 
180 185 190 

Leu Ala Phe Asn Ser Asp Asn Glu Thr Ala Gin He Trp He Glu Gly 
0 195 200 205 

Val Val Glu Gly His He He Gly Thr He Gly His Phe Leu Leu Gly 
210 215 220 

Phe Leu Gly Pro Leu Ala He He Gly Thr Cys Ala His Leu He Arg 
225 230 235 240 

5 Ala Lys Leu Leu Arg Glu Gly Trp Val His Ala Asn Arg Pro Lys Arg 

245 250 255 

Leu Leu Leu Val Leu Val Ser Ala I^he Phe He Phe Trp Ser Pro Phe 
260 265 270 

Asn Val Val Leu Leu Val His Leu Trp Arg Arg Val Met Leu Lys Glu 
275 2R0 285 

He Tyr His Pro Arg Met Leu Leu He Leu Gin Ala Ser Phe Ala Leu 
290 295 300 

Gly Cys Val Asn Ser Ser Leu Asn Pro Phe Leu Tyr Val Phe Val Gly 
305 310 315 320 

5 Arg Asp Phe Gin Glu Lys Phe Phe Gin Ser Leu Thr Ser Ala Leu Ala 

325 330 335 

Arg Ala Phe Gly Glu Glu Glu Phe Leu Ser Ser Cys Pro Arg Gly Asn 



0 
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340 345 ISO 

Ala Pro Arg Glu 
355 

(272) INFORT^ATION FOR SEQ ID NO:271: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 9 03 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 271: 

ATGGACCTGC CCCCGCAGCT CTCCTTCGGC CTCTATGTGG CCGCCTTTGC GCTGGGCTTC 6 0 

CCGCTCAACG TCCTGGCCAT CCGAGGCGCG ACGGCCCACG CCCGGCTCCG TCTCACCCCT 120 

AGCCTGGTCT ACGCCCTGAA CCTGGGCTGC TCCGACCTGC TGCTGACAGT CTCTCTGCCC 18 0 

CTGAAGGCGG TGGAGGCGCT AGCCTCCGGG GCCTGGCCTC T::;CCGGCCTC GCTGTGCCCC 24 0 

GTCTTCGCGG TGGCCCACTT CTTCCCACTC TATGCCGGCG GGGGCTTCCT GGCCGCCCTG 3 00 

AGTGCAGGCC GCTACCTGGG AGCAGCCTTC CCCTTGGGCT ACCAAGCCTT CCGGAGGCCG 360 

TGCTATTCCT GGGGGGTGTG CGCGGCCATC TGGGCCCTCG TCCTGTGTCA CCTGGGTCTG 42 0 

GTCTTTGGGT TGGAGGCTCC AGGAGGCTGG CTGGACCACA GCAACACCTC CCTGGGCATC 4 90 

AACACACCGG TCAACGGCTC TCCGGTCTGC CTGGAGGCCT G<3GACCCGGC CTCTGCCGGC 54 0 

CCGGCCCGCT TCAGCCTCTC TCTCCTGCTC TTTTTTCTGC CCTTGGCCAT CACAGCCTTC 6 00 

TGCTACGTGG GCTGCCTCCG GGCACTGGCC CGCTCCGGCC TGACGCACAG GCGGAAGCTG 66 0 

CGGGCCAAAT GGGTGGCCGG CGGGGCCCTC CTCACGCTGC TGCTCTGCGT AGGACCCTAC 72 0 

AACGCCTCCA ACGTGGCCAG CTTCCTGTAC CCCAATCTAG GAGGCTCCTG GCGGAAGCTG 780 

GGGCTCATCA CGGGTGCCTG GAGTGTGGTG CTTAATCCGC TGGTGACCGG TTACTTGGGA 84 0 

AGGGGTCCTG GCCTGAAGAC AGTGTGTGCG GCAAGAACGC AAGGGGGCAA GTCCCAGAAG 900 

TAA 90 3 

(273) INFORMATION FOR SEQ ID NO: 2 72: 

Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 300 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 
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(D) TOPOLOGY: not relevant 
{li) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:272: 

Met Asp Leu Pro Pro Gin Leu Ser Phe Gly Leu Tyr Val Ala. Ala Phe 
15 10 15 

Ala Leu Gly Phe Pro Leu Asn Val Leu Ala lie Arg Gly Ala Thr Ala 
20 25 30 

His Ala Arg Leu Arg Leu Thr Pro Ser Leu Val Tyr Ala Leu Asn Leu 
35 40 45 

Gly Cys Ser Asp Leu Leu Leu Thr Val Ser Leu Pro Leu Lys Ala Val 
50 55 60 

Glu Ala Leu Ala Ser Gly Ala Trp Pro Leu Pro Ala Ser Leu Cys Pro 
65 70 75 80 

Val Phe Ala Val Ala His Phe Phe Pro Leu Tyr Ala Gly Gly Gly Phe 
85 90 95 

Leu Ala Ala Leu Ser Ala Gly Arg Tyr Leu Gly Ala Ala Phe Pro Leu 
100 ICS 110 

Gly Tyr Gin Ala Phe Arg Arg Pro Cyr, Tyr Ser Trp Gly Val Cys Ala 
115 120 125 

Ala He Trp Ala Leu Val Leu Cys His Leu Gly Leu Val Phe Gly Leu 
130 135 140 

Glu Ala Pro Gly Gly Trp Leu Asp His Ser Asn Thr Ser Leu Gly lie 
145 150 155 160 

Asn Thr Pro Val Asn Gly Ser Pro Val Cys Leu Glu Ala Trp Asp Pro 
165 170 175 

Ala Ser Ala Gly Pro Ala Arg Phe Ser Leu Ser Leu Leu Leu Phe Phe 
180 185 190 

Leu Pro Leu Ala lie Thr Ala Phe Cys Tyr Val Gly Cys Leu Arg Ala 
195 200 205 

Leu Ala Arg Ser Gly Leu Thr His Arg Arg Lys Leu Arg Ala Lys Trp 
210 215 220 

Val Ala Gly Gly Ala Leu Leu Thr Leu Leu Leu Cys Val Gly Pro Tyr 
225 230 235 240 

Asn Ala Ser Asn Val Ala Ser Phe Leu Tyr Pro Asn Leu Gly Gly Ser 
245 250 255 

Trp Arg Lys Leu Gly Leu He Thr Gly Ala Trp Ser Val Val Leu Asn 
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260 265 270 

Pro Leu Val Thr Gly Tyr Leu Gly Arg Gly Pro Gly Leu Lys Thr Val 
275 280 285 

Cys Ala Ala Arg Thr Gin Gly Gly Lys Ser Gin Lys 
5 290 295 300 

(274) INFORMATION FOR SEQ ID NO: 273: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1041 base pairs 

(B) TYPE: nucleic acid 

0 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:273: 

ATGGATACAG GCCCCGACCA GTCCTACTTC TCCGGCAATC ACTGGTTCGT CTTCTCGGTG 6 0 

5 TACCTTCTCA CTTTCCTGGT GGGGCTCCCC CTCAACCTGC TGGCCCTGGT GGTCTTCGTG 12 0 

GGCAAGCTGC AGCGCCGCCC GGTGGCCGTG GAGGTGCTCC TGCTCAACCT GACCGCCTCG 16 0 

GACCTGCTCC TGCTGCTGTT CCTGCCTTTC CGCATGGTGG AGGCAGCCAA TGGCATGCAC 24 0 

TGGCCCCTGC CCTTCATCCT CTGCCCACTC TCTGGATTCA TCTTCTTCAC CACCATCTAT 3 00 

CTCACCGCCC TCTTCCTGGC AGCTGTGAGC ATTGAACGCT TCCTGAGTGT GGCCCACCCA 360 

CTGTGGTACA AGACCCGGCC GAGGCTGGGG CAGGCAGGTC TGGTGAGTGT GGCCTGCTGG 420 

CTGTTGGCCT CTGCTCACTG CAGCGTGGTC TACGTCATAG AATTCTCAGG GGACATCTCC 4 80 

CACAGCCAGG GCACCAATGG GACCTGCTAC CTGGAGTTCC GGAAGGACCA GCTAGCCATC 540 

CTCCTGCCCG TGCGGCTGGA GATGGCTGTG GTCCTCTTTG TGGTCCCGCT GATCATCACC 6 00 

AGCTACTGCT ACAGCCGCCT GGTGTGGATC CTCGGCAGAG GGGGCAGCCA CCGCCGGCAG 660 

AGGAGGGTGA AGGGGCTGTT GGCGGCCACG CTGCTCAACT TCCTTGTCTG CTTTGGGCCC 72 0 

TACAACGTGT CCCATGTCGT GGGCTATATC TGCGGTGAAA GCCCGGCATG GAGGATCTAC 78 0 

GTGACGCTTC TCAGCACCCT GAACTCCTGT GTCGACCCCT TTGTCTACTA CTTCTCCTCC 84 0 

TCCGGGTTCC AAGCCGACTT TCATGAGCTG CTGAGGAGGT TGTGTGGGCT CTGGGGCCAG 90 0 

TGGCAGCAGG AGAGCAGCAT GGAGCTGAAG GAGCAGAAGG GAGGGGAGGA GCAGAGAGCG 96 0 

GACCGACCAG CTGAAAGAAA GACCAGTGAA CACTCACAGG GCTGTGGAAC TGGTGGCCAG 102 0 

GTGGCCTGTG CTGAAAGCTA G 1041 
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(275) INFORMATION FOK SEQ ID NO: 274: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 346 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:274: 

Met Asp Thr Gly Pro Asp Gin Ser Tyr Phe Ser Gly Asn His Trp Phe 
15 10 IS 

Val Phe Ser Val Tyr Leu Leu Thr Phe Leu Val Gly Leu Pro Leu Asn 
20 25 30 

Leu Leu Ala Leu Val Val Phe Val Gly Lys Leu Gin Arg Arg Pro Val 
35 40 45 

Ala Val Asp Val Leu Leu Leu Asn Leu Thr Ala Ser Asp Leu Leu Leu 
50 55 50 

Leu Leu Phe Leu Pro Phe Arg Met Val Glu Ala Ala Asn Gly Met His 
65 70 75 80 

Trp Pro Leu Pro Phe lie Leu Cys Pro Leu Ser Gly Phe He Phe Phe 
85 90 95 

Thr Thr He Tyr Leu Thr Ala Leu Phe Leu Ala Ala Val Ser He Glu 
100 105 110 

Arg Phe Leu Ser Val Ala His Pro Leu Trp Tyr Lys Thr Arg Pro Arg 
115 120 125 

Leu Gly Gin Ala Gly Leu Val Ser Val Ala Cys Trp Leu Leu Ala Ser 
130 135 140 

Ala His Cys Ser Val Val Tyr Val He Glu Phe Ser Gly Asp He Ser 
145 150 155 160 

His Ser Gin Gly Thr Asn Gly Thr Cys Tyr Leu Glu Phe Arg Lys Asp 
165 170 175 

Gin Leu Ala He Leu Leu Pro Val Arg Leu Glu Met Ala Val Val Leu 
180 185 190 

Phe Val Val Pro Leu He He Thr Ser Tyr Cys Tyr Ser Arg Leu Val 
195 200 205 

Trp He Leu Gly Arg Gly Gly Ser His Arg Arg Gin Arg Arg Val Lys 
210 215 220 



Gly Leu Leu Ala Ala Thr Leu Leu Asn Phe Leu Val Cys Phe Gly Pro 



wo 00/22129 



PCT/US99/23938 



228 

225 230 235 240 

Tyr Asn Val Ser His Val Val Gly Tyr lie Cys Gly Glu Ser Pro Ala 
245 250 255 

Trp Arg lie Tyr Val Thr Leu Leu Ser Thr Leu Asn Ser Cys Val Asp 
260 265 270 

Pro Phe Val Tyr Tyr Phe Ser Ser Ser Gly Phe Gin Ala Asp Phe His 
275 280 285 

Glu Leu Leu Arg Arg Leu Cys Gly Leu Trp Gly Gin Trp Gin Gin Glu 
290 295 300 

Ser Ser Met Glu Leu Lys Glu Gin Lys Gly Gly Glu Glu Gin Arg Ala 
305 310 315 320 

Asp Arg Pro Ala Glu Arg Lys Thr Ser Glu His Ser Gin Gly Cys Gly 
325 330 335 

Thr Gly Gly Gin Val Ala Cys Ala Glu Ser 
340 345 

(276) INFORMATION FOR SEQ ID NO: 275; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 993 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:275: 

ATGCTGCCGG ACTGGAAGAG CTCCTTGATC CTCATGGCTT ACATCATCAT CTTCCTCACT 6 0 

GGCCTCCCTG CCAACCTCCT GGCCCTGCGG GCCTTTGTGG GGCGGATCCG CCAGCCCCAG 120 

CCTGCACCTG TGCACATCCT CCTGCTGAGC CTGACGCTGG CCGACCTCCT CCTGCTGCTG 18 0 

CTGCTGCCCT TCAAGATCAT CGAGGCTGCG TCGAACTTCC GCTGGTACCT GCCCAAGGTC 240 

GTCTGCGCCC TCACGAGTTT TGGCTTCTAC AGCAGCATCT ACTGCAGCAC GTGGCTCCTG 3 00 

GCGGGCATCA GCATCGAGCG CTACCTGGGA GTGGCTTTCC CCGTGCAGTA CAAGCTCTCC 36 0 

CGCCGGCCTC TGTATGGAGT GATTGCAGCT CTGGTGGCCT GGGTTATGTC CTTTGGTCAC 420 

TGCACCATCG TGATCATCGT TCAATACTTG AACACGACTG AGCAGGTCAG AAGTGGCAAT 4 80 

GAAATTACCT GCTACGAGAA CTTCACCGAT AACCAGTTGG ACGTGGTGCT GCCCGTGCGG 54 0 

CTGGAGCTGT GCCTGGTGCT CTTCTTCATC CCCATGGCAG TCACCATCTT CTGCTACTGG 600 



0 
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CGTTTTGTGT GGATCATGCT CTCCCAGCCC CTTGTGGGGG CCCAGAGGCG GCGCCGAGCC 660 

AAGGGGCTGG CTGTGGTGAC GCTGCTCAAT TTCCTGGTGT GCTTCGGACC TTACAACGTG 72 0 

TCCCACCTGG TGGGGTATCA CCAGAGAAAA AGCCCCTGGT GGCGGTCT^T AGCCGTGGTG 7 80 

TTCAGTTCAC TCAACGCCAG TCTGGACCCC CTGCTCTTCT ATTTCTCTTC TTCAGTGGTG 94 0 

CGCAGGGCAT TTGGGAGAGG GCTGCAGGTG CTGCGGAATC AGGGCTCCTC CCTGTTGGGA 9 00 

CGCAGAGGCA AAGACACAGC AGAGGGGACA AATGAGGACA GGGGTGTGGG TCAAGGAGAA 96 0 

GGGATGCCAA GTTCGGACTT CACTACAGAG TAG 993 
(277) INFORMATION FOR SEQ ID NO: 276: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 330 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 
{ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 276: 

Met Leu Pro Asp Trp Lys Ser Ser Leu lie Leu Met Ala Tyr Tie He 
15 10 15 

He Phe Leu Thr Gly Leu Pro Ala Asn Leu Leu Ala Leu Arg Ala Phe 
20 25 30 

Val Gly Arg He Arg Gin Pro Gin Pro Ala Pro Val His He Leu Leu 
35 40 45 

Leu Ser Leu Thr Leu Ala Asp Leu Leu Leu Leu Leu Leu Leu Pro Phe 
BO 55 60 

Lys He He Glu Ala Ala Ser Asn Phe Arg Trp Tyr Leu Pro Lys Val 
65 70 75 80 

Val Cys Ala Leu Thr Ser Phe Gly Phe Tyr Ser Ser He Tyr Cys Ser 
85 90 95 

Thr Trp Leu Leu Ala Gly He Ser He Glu Arg Tyr Leu Gly Val Ala 
100 105 110 

Phe Pro Val Gin Tyr Lys Leu Ser Arg Arg Pro Leu Tyr Gly Val He 
115 120 125 

Ala Ala Leu Val Ala Trp Val Met Ser Phe Gly His Cys Thr He Val 
130 135 140 

He He Val Gin Tyr Leu Asn Thr Thr Glu Gin Val Arg Ser Gly Asn 
150 155 160 
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Glu He Thr Cys Tyr Glu Asn Phe Thr Asp Asn Gin Leu Asp Val Val 
165 170 175 

Leu Pro Val Arg Leu Glu Leu Cys Leu Val Leu Phe Phe He Pro Met 
18C 185 190 

5 Ala Val Thr He Phe Cys Tyr Trp Arg Phe Val Trp rie Met Leu Ser 

195 200 205 

Gin Pro Leu Val Gly Ala Gin Arg Arg Arg Arg Ala Lys Gly Leu Ala 
210 215 220 

Val Val Thr Leu Leu Asn Phe Leu Val Cys Phe Gly Pro Tyr Asn Val 
0 225 230 235 ^ 240 

Ser His Leu Val Gly Tyr His Gin Arg Lys Ser Pro Trp Trp Arg Ser 
245 250 255 

He Ala Val Val Phe Ser Ser Leu A^n Ala Ser Leu Asp Pro Leu Leu 
260 265 270 

5 Phe Tyr Phe Ser Ser Ser Val Val Arg Arg Ala Phe Gly Arg Gly Leu 

275 280 285 

Gin Val Leu Arg Asn Gin Gly Ser Ser Leu Leu Gly Arg Arg Gly Lys 
290 295 300 

Asp Thr Ala Glu Gly Thr Asn Glu Asp Arg Gly Val Gly Gin Gly Glu 
0 305 310 315 320 

Gly Met Pro Ser Ser Asp Phe Thr Thr Glu 
325 330 



(278) INFORMATION FOR SEQ ID NO: 277; 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2724 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 77: 



ATGGACACCT CCCGGCTCGG TGTGCTCCTG 

GGGGGCAGCT CTCCCAGGTC TGGTGTGTTG 

GAGCCCGACG GCAGGATGTT GCTCAGGGTG 

CCTTCCAACC TCAGCGTCTT CACCTCCTAC 



TCCTTGCCTG TGCTGCTGCA GCTGGCGACC 6 0 

CTGAGGGGCT GCCCCACACA CTGTCATTGC 12 0 

GACTGCTCCG ACCTGGGGCT CTCGGAGCTG 18 0 

CTAGACCTCA GTATGAACAA CATCAGTCAG 24 0 



CTGCTCCCGA ATCCCCTGCC CAGTCTCCGC TTCCTGGAGG AGTTACGTCT TGCGGGAAAC 3 00 
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GCTCTGACAT ACATTCCCAA GGGAGCATTC 
CTGCAGAATA ATCAGCTAAG ACACGTACCC 
CAATCCCTGC GTCTGGATGC TAACCACATC 
CTGCATTCCC TGAGGCACCT GTGGCTGGAT 
GCTTTTAGAA GTTTATCGGC ATTGCAAGCC 
ATACCAGACT ATGCCTTTGG AAACCTCTCC 
AGAATCCACT CCCTGGGAAA GAAATGCTTT 
TTAAATTACA ATAACCTTGA TGAATTCCCC 
GAACTAGGAT TTCATAGCAA CAATATCAGG 
CCTTCTCTTA TTACAATACA TTTCTATGAC 
TTTCAACATT TACCTGAACT AAGAACACTG 
TTTCCTGATT TAACTGGAAC TGCAAACCTG 
TCATCTCTTC CTCAAACCGT CTGCAATCAG 
TACAACCTAT TAGAAGATTT ACCCAGTTTT 
CTAAGACATA ATGAAATCTA CGAAATTAAA 
CGATCGCTGA ATTTGGCTTG GAACAAAATT 
TTGCCATCCC TAATAAAGCT GGACCTATCG 
GGGTTACATG GTTTAACTCA CTTAAAATTA 
TCATCTGAAA ACTTTCCAGA ACTCAAGGTT 
GCATTTGGAG TGTGTGAGAA TGCCTATTU^G 
AGCAGTATGG ACGACCTTCA TAAGAAAGAT 
GACCTTGAAG ATTTCCTGCT TGACTTTGAG 
TGTTCACCTT CCCCAGGCCC CTTCAAACCC 
AGAATTGGAG TGTGGACCAT AGCAGTTCTG 
ACAGTTTTCA GATCCCCTCT GTACATTTCC 
GCAGTGAACA TGCTCACGGG AGTCTCCAGT 
TTTGGCAGCT TTGCACGACA TGGTGCCTGG 
GGTTTTTTGT CCATTTTTGC TTCAGAATCA 
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ACTGGCCTTT ACAGTCTTAA AGTTCTTATG 3G0 

ACAGAAGCTC TGCAGAATTT GCGAAGCCTT 42 0 

AGCTATGTGC CCCCAAGCTG TTTCAGTGGC 4C0 

GACAATGCGT TAACAGAAAT CCCCGTCCAG 54 0 

ATGACCTTGG CCCTGAACAA AATACACCAC 6 00 

AGCTTGGTAG TTCTACATCT CCATAACAAT 66 0 

GATGGGCTCC ACAGCCTAGA GACTTTAGAT 72 0 

ACTGCAATTA GGACACTCTC CAACCTTAAA 78 0 

TCGATACCTG AGAAAGCATT TGTAGGCAAC 84 0 

AATCCCATCC AATTTGTTGG GAGATCTGCT 90 0 

ACTCTGAATG GTGCCTCACA AATAACTGAA 96 0 

GAGAGTCTGA CTTTAACTGG AGCACAGATC 102 0 

TTACCTAATC TCCAAGTGCT AGATCTGTCT 108 0 

TCAGTCTGCC AAAAGCTTCA GAAAATTGAC 114 0 

GTTGACACTT TCCAGCAGTT GCTTAGCCTC 1200 

GCTATTATTC /iCCCCAATGC ATTTTCCACT 12 6 0 

TCCAACCTCC TGTCGTCTTT TCCTATAACT 13 2 0 

ACAGGAAATC ATGCCTTACA GAGCTTGATA 138 0 

ATAGAAATGC CTTATGCTTA CCAGTGCTGT 14 4 0 

ATTTCTAATC AATGGAATAA AGGTGACAAC 1500 

GCTGGAATGT TTCAGGCTCA AGATGAACGT 156 0 

GAAGACCTGA AAGCCCTTCA TTCAGTGCAG 162 0 

TGTGAACACC TGCTTGATGG CTGGCTGATC 1680 

GCACTTACTT GTAATGCTTT GGTGACTTCA 174 0 

CCCATTAAAC TGTTAATTGG GGTCATCGCA 180 0 

GCCGTGCTGG CTGGTGTGGA TGCGTTCACT 18 6 0 

TGGGAGAATG GGGTTGGTTG CCATGTCATT 192 0 

TCTGTTTTCC TGCTTACTCT GGCAGCCCTG 198 0 
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GAGCGTGGGT TCTCTGTGAA ATATTCTGCA AAATTTGAAA CGAAAGCTCC ATTTTCTAGC 2 04 0 

CTGAAAGTAA TCATTTTGCT CTGTGCCCTG CTGGCCTTGA CCATGGCCGC AGTTCCCCTG 2100 

CTGGGTGGCA GCAAGTATGG CGCCTCCCCT CTCTGCCTGC CTTTGCCTTT TGGGGAGCCC ni6 0 

AGCACCATGG GCTACATGGT CGCTCTCATC TTGCTCAATT CCCTTTGCTT CCTCATGATG ?.22 0 

ACCATTGCCT ACACCAAGCT CTACTGCAAT TTGGACAAGG GAGACCTGGA GAATATTTGG 2280 

GACTGCTCTA TGAAAAAACA CATTGCCCTG TTGCTCTTCA CCAACTGCAT CCTAAACTGC 2 34 0 

CCTGTGGCTT TCTTGTCCTT CTCCTCTTTA ATAAACCTTA CATTTATCAG TCCTGAAGTA 24 0 0 

ATTAAGTTTA TCCTTCTGGT GGTAGTCCCA CTTCCTGCAT GTCTCAATCC CCTTCTCTAC 7:4 6 0 

ATCTTGTTCA ATCCTCACTT TAAGGAGGAT CTGGTGAGCC TGAGAAAGCA AACCTACGTC 2 52 0 

TGGACAAGAT CAAAACACCC AAGCTTGATG TCAATTAACT CTGATGATGT CGAAAAACAG 2 580 

TCCTGTGACT CAACTCAAGC CTTGGTAACC TTTACCAGCT CCAGCATCAC TTATGACCTG 2 64 0 

CCTCCCAGTT CCGTGCCATC ACCAGCTTAT CCAGTGACTG AGAGCTGCCA TCTTTCCTCT 2 700 

GTGGCATTTG TCCCATGTCT CTTIA 2 724 
{279} INFORMATION FOR SEQ ID NO: 278: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 90 7 amino acidf^ 

(B) TYPE; amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 78: 

Met Asp Thr Ser Arg Leu Gly Val Leu Leu Ser Leu Pro Val Leu Leu 
15 10 15 

Gin Leu Ala Thr Gly Gly Ser Ser Pro Arg Ser Gly Val Leu Leu Arg 
20 25 30 

Gly Cys Pro Thr His Cys His Cys Glu Pro Asp Gly Arg Met Leu Leu 
35 40 45 

A.rg Val Asp Cys Ser Asp Leu Gly Leu Ser Glu Leu Pro Ser Asn Leu 
50 55 60 

Ser Val Phe Thr Ser Tyr Leu Acp Leu Ser Met Asn Asn He Ser Gin 
65 70 75 80 

Leu Leu Pro Asn Pro Leu Pro Ser Leu Arg Phe Leu Glu Glu Leu Arg 
85 90 95 
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I.eu Ala Gly Asn Ala Leu Thr Tyr lie Pro Lys Gly Ala Phe Thr Gly 
100 lOS 110 

Leu Tyr Ser Leu Lys Val Leu Met Leu Gin Asn Asn Gin Leu Arg His 
lis 120 125 

Val Pro Thr Glu Ala Leu Gin Asn Leu Arg Ser Leu Gin Ser Leu Arg 
130 135 140 

Leu Asp Ala Asn His lie Ser Tyr Val Pro Pro Ser Cys Phe Ser Gly 
145 150 155 160 

Leu His Ser Leu Arg His Leu Trp Leu Asp Asp Asn Ala Leu Thr Glu 
165 170 175 

lie Pro Val Gin Ala Phe Arg Ser Leu Ser Ala Leu Gin Ala Met Thr 
180 185 190 

Leu Ala Leu Asn Lys lie His His lie Pro Asp Tyr Ala Phe Gly Asn 
195 200 205 

Leu Ser Ser Leu Val Val Leu His Leu His Asn Asn Arg lie His Ser 
210 215 220 

Leu Gly Lys Lys Cys Phe Asp Gly Leu His Ser Leu Glu Thr Leu Asp 
225 230 235 240 

Leu Asn Tyr Asn Asn Leu Asp Glu Phe- Pro Thr Ala lie Arg Thr Leu 
245 250 255 

Ser Asn Leu Lys Glu Leu Gly Phe His Ser Asn Asn Tie Arg Ser lie 
260 265 270 

Pro Glu Lys Ala Phe Val Gly Asn Pro Ser Leu lie Thr lie His Phe 
275 280 285 

Tyr Asp Asn Pro lie Gin Phe Val Gly Arg Ser Ala Phe Gin His Leu 
290 295 300 

Pro Glu Leu Arg Thr Leu Thr Leu Asn Gly Ala Ser Gin He Thr Glu 
305 310 315 320 

Phe Pro Asp Leu Thr Gly Thr Ala Asn Leu Glu Ser Leu Thr Leu Thr 
325 330 335 

Gly Ala Gin lie Ser Ser Leu Pro Gin Thr Val Cys Asn Gin Leu Pro 
340 345 350 

Asn Leu Gin Val Leu Asp Leu Ser Tyr Asn Leu Leu Glu Asp Leu Pro 
355 360 365 

Ser Phe Ser Val Cys Gin Lys Leu Gin Lys lie Asp Leu Arg His Asn 
370 375 380 



Glu He Tyr Glu He Lys Val Asp Thr Phe Gin Gin Leu Leu Ser Leu 
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385 390 395 400 

Arg Ser Leu Asn Leu Ala Trp Asn Lys lie Ala He He His Pro Asn 
405 410 415 

Ala Phe Ser Thr Leu Pro Ser Leu He Lys Leu Asp Leu Ser Ser Asn 
420 425 430 

Leu Leu Ser Ser Phe Pro He Thr Gly Leu His Gly Leu Thr His Leu 
435 440 445 

Lys Leu Thr Gly Asn His Ala Leu Gin Ser Leu He Ser Ser Glu Asn 
450 455 460 

Phe Pro Glu Leu Lys Val He Glu Met Pro Tyr Ala T>'r Gin Cys Cys 
465 470 475 480 

Ala Phe Gly Val Cys Glu Asn Ala Tyr Lys He Ser Asn Gin Trp Asn 
485 490 495 

Lys Gly Asp Asn Ser Ser Met Asp Asp Leu His Lys Lys Asp Ala Gly 
500 505 510 

Met Phe Gin Ala Gin Asp Glu Arg Asp Leu Glu Asp Phe Leu Leu Asp 
515 520 525 

Phe Glu Glu Asp Leu Lys Ala Leu His Ser Val Gin Cys Ser Pro Ser 
530 535 540 

Pro Gly Pro Phe Lys Pro Cys Glu His Leu Leu Asp Gly Trp Leu He 
545 550 555 560 

Arg He Gly Val Trp Thr He Ala Val Leu Ala Leu Thr Cys Asn Ala 
565 570 575 

Leu Val Thr Ser Thr Val Phe Arg Ser Pro Leu Tyr He Ser Pro He 
580 585 590 

Lys Leu Leu He Gly Val He Ala Ala Val Asn Met Leu Thr Gly Val 
595 600 605 

Ser Ser Ala Val Leu Ala Gly Val Asp Ala Phe Thr Phe Gly Ser Phe 
610 615 620 

Ala Arg His Gly Ala Trp Trp Glu Asn Gly Val Gly Cys His Val He 
625 630 635 640 

Gly Phe Leu Ser He Phe Ala Ser Glu Ser Ser Val Phe Leu Leu Thr 
645 650 655 

Leu Ala Ala Leu Glu Arg Gly Phe Ser Val Lys Tyr Ser Ala Lys Phe 
660 665 670 



Glu Thr Lys Ala Pro Phe Ser Ser Leu Lys Val He He Leu Leu Cys 
675 680 685 
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Ala Leu Leu Ala Leu Thr Met Ala Ala Val Pro Leu Leu Gly Gly Ser 
690 695 700 

Lys Tyr Gly Ala Ser Pro Leu Cys Leu Pro Leu Pro Phe Gly Glu Pro 
705 710 715 720 

Ser Thr Met Gly Tyr Met Val Ala Leu He Leu Leu Asn Ser Leu Cys 
725 730 735 

Phe Leu Met Met Thr He Ala Tyr Thr Lya Leu Tyr Cys Asn Leu Asp 
740 745 750 

Lys Gly Asp Leu Glu Asn He Trp Asp Cys Ser Met Lys Lys His He 
755 760 765 

Ala Leu Leu Leu Phe Thr Asn Cys He Leu Asn Cys Pro Val Ala Phe 
770 775 780 

Leu Ser Phe Ser Ser Leu He Asn Leu Thr Phe He Ser Pro Glu Val 
785 790 795 800 

He Lys Phe He Leu Leu Val Val Val Pro Leu Pro Ala Cys Leu Asn 
805 810 815 

Pro Leu Leu Tyr He Leu Phe Asn Pro His Phe Lys Glu Asp Leu Val 
820 825 830 

Ser Leu Arg Lys Gin Thr Tyr Val Trp Thr Arg Ser Lys His Pro Ser 
835 840 B45 

Leu Met Ser He Asn Ser Asp Asp Val Glu Lys Gin Ser Cys Asp Ser 
850 855 860 

Thr Gin Ala Leu Val Thr Phe Thr Ser Ser Ser He Thr Tyr Asp Leu 
865 870 875 880 

Pro Pro Ser Ser Val Pro Ser Pro Ala Tyr Pro Val Thr Glu Ser Cys 
895 890 895 

His Leu Ser Ser Val Ala Phe Val Pro Cys Leu 
900 905 

(280) INFORMATION FOR SEQ ID NO: 2 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 9: 
CATGCCAACC GGCCCGCGAG GCTGCTGCTG GT 32 
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(281) INFORMATION FOR SEQ ID NO: 280: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 0: 
ACCAGCAGCA GCCTCGCGGG CCGGTTGGCA TG 
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